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ABSTRACT 


The Personal Database Management System is a hardware 
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nology. The design is based upon how people manage their 
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One of the factors which limits human performance is the 
limited capacity of human memory. Memory 1s commonly 
considered to be divided into two parts: short-term and 
long~-ter:. Short-term memory is that part which we can 
consciously access; it may be compared to the primary store 
of a computer. It is characterized by rapid access and 
Mobatility. Long-term memory is analogous to secondary 
storage in that it is more permanent in nature than short- 
term memory and it requires more time and effort to record 
information to and retrieve information from [1]. 

Short-term memory is a major limiting factor on human 
performance because it is the memory which is consciously 


accessible and thus our working memory, and it is very 


limited in its capacity. This memory holds units of infor- 
Mation for up to thirty seconds. That period may be 
extended through repetition and rehearsal. The size of 


short-term memory is approximately seven units of informa- 
tion (plus or minus two). The nature of these units is a 
function of experience and training. For example, someone 
familiar with English may find it easy to remember séven 
English words but difficult to remember seven Chinese ideo- 
grams. Thus it is easy to see that the information 
processing capacity of humans can be easily overloaded. 
Long term memory limits performance because of the time and 
SfrO0rt associated with fetches from and stores to it [1]. 
The idea behind a Personal Database Management System 
(PDBMS) is to provide an extension +5 both short-term memory 
and long-term memory. A good PDBMS should provide its users 
with means of storing information and later retrieving it 
that are faster and mors efficient than ordinary human 
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means. Long-term memory can be extended by allowing users 
to easily store information which they find difficult to 
memorize. Numerical information such as phone numbers, safe 
combinations, and part numbers are 2xamples of information 
which are usually expensive in the anount of effort required 
to ensure that they are not soon forgotten. Short-term 
memory can be extended by providing users with a way to 
relieve the burden upon its capacity. Instead of having to 
remember a piece of information or a key (or cue) ee 
retrieving the desired information, a PDBMS can accept the 
key aS input and retrieve the desired information. Once the 
key has been entered into the systen, it may be forgotten, 
freeing a portion of short-term memory for more information. 
Also, retrieved information need not be memorized if the 
PDBMS records it in amanner which allows it to be easily 
accessed. For example, information recorded on a piece of 
paper or on a display screen need not be memorized if it is 
within easy reach. 

What should be the characteristics and what are the 
requirements of a Personal Database Management System? 
Because itis designed for the storage and retrieval of 
personal information, it is a single-user systen. TATOrdes 
to be useful to a broad range of people, 1t should permit 
interaction at different levels, depending on the sophisti- 
cation of the user. Novice users will be easily discouraged 
and see very little benefit if a system appears to be illog- 
ical and complicated. Also, because of the personal nature 
of the information in the database, the system should 
provide security to that information. Pincay, in Obder to 
be acceptable, it should be small, light-weight, and 
inexpensive. 

This last requirement was taken to indicate that such a 
system should be built using a battery-driven micropro- 
cessor. Current microprocessor technology provides aore 
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computer power than is needed strictly for a PDBMS. So the 
design presented here incorporates the following additional 
capabilities: 1) the ability *o be used as a calculator, 2) 
the ability to be programmed by the user, and 3) the ability 
to be connected into networks or *> other devices via an 
RS232 serial interface. 

The PDBMS is programmed ina non-standard version of 
FORTH. The particular one used here is neither fig-FORTH 
mon FrORTH-79, the two most prevalent versions of FORTH. 
However, the basis for the language used is 8080 fig-FORTH, 
version 1.3, which was partially modified to conform with 
the FORTH-79 standards [2]. Further modifications were made 
to this based upon hardware characteristics, and the sugges- 
tions and ideas of various members of FORTH Interest Group. 
Paespite of this, when referred to in this thesis, the 
language used in the PDBMS will be called FORTH. One major 
distinction should be made, however, the PDBMS's base vocab- 
ulary is called ROOT, not FORTH. 
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A. BACKGROUND 


The largest part of the information presented in this 
chapter was derived from detailed study of four personal 
address books (Appendix B contains detailed statistics fron 
this study). Address books were used aS a baSis' for the 
preliminary investigation of personal databases because they 
were found to be nore structured, standardized, and easily 
computerized than other personal databases (@.g., shopping 
lists, appointment calendars, and things-to-do lists). 

The people (some of whom worked with computers daily) 
interviewed during the study indicated that the maintenance 
of personal databases is not analogous to management of 
databases by computer. Indeed, the ways in which a database 
Managemen* system (DBMS) is structured, maintained, and used 
is very different from the way people manage their personal 
information. The results of the author's studies and inter- 
views seem to indicate that the essential difference between 
DBMSS and personal information management is the number of 
"System" users. It is this difference that is the apparent 
cause of most all of the other differences. 

Because DBMSs are normally organizational tools with 
Many users, records, fields, attribute values, query 
languages, keys, etc., they must be standardized. Because 
Organizational data is entered and cetrieved by many 
different individuals and thus without standardization, it 
Meald be difficult for one person to know of information 
entered into the system by another, much less retrieve it. 
On the other hand, personal information is shared by only a 
few people, if any. An important point here is that in such 


14 





a situation where there is only one user, that user knows 
(or knew at one time) all of the information in the systen 
because he entered it. People record and maintain personal 
information in an auxiliary store in order to relieve then- 
selves of some of the burdens of recall and recognition. 
Because long-term memory is generally considered to be 
permanent {1], the data recorded in auxiliary stores need 
not be a verbatim copy of the information which is to be 
retrieved later. Truly personal information needs only to 
contain enough context-specific cues to enable a person to 
reconstruct oor recall the structure of their semantic 
memory. 

"The Recognition of Previous Encounters," by George 
Mandler {3] describes semantic structures as an organization 
of memory (referred to as a "familiarity variable"). These 
structures represent the familiarity of events (and of the 
entities which are part of an event), and are unigue to each 
particular event. Further, they are independent of the 
context in which the event occurs or in which it is 
embedded. Two sets of independent processes operate upon 
semantic structures: lontra-event processes which are 
referred to as “integration,” and inter-event processes 
which relate an event to others called “elaboration." 
Mandler's hypothesis is that recognition is related to inte- 
gration, which is developed through attentive repetition 
(rote learning). Recall is related +o elaboration, which is 
strengthened by the establishment Of srelatcvenal Jinks 
between the target event and other representations in 


memory!. Mandler does not describe how intsgration and 


tRecognition is the process fe sees from a familiar 
event to the context which caused the évent to be remen- 
bered. Recall is the opposite process, that is, remembering 
an event from its context. When a person attempts to 
remember where he knows a familiar face fron, he is 
OE eae recognition. Recall is what a person attempts to 
do when he knows his wife told him to get something on the 
way home, put has forgotten what. 
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elaboration manifest themselves except in an abstract way. 
They must involve the establishment of cues which act as 
keys to semantic structures whether they might be direct (as 
one would expect in the case of integration) or indirect (as 
might be the case for elaboration) access. It is these cues 
which must be available to a person in order to retrieve the 
desired events and entities. It is this that makes personal 
databases different from DBMSs. 

Even though only the minimum number of cues need he 
saved in order to retrieve information, the author's studies 
revealed that usually more than the minimum required cues 
are recorded. For example, there is usually no need to 
record one's parents’ city and state of residence, yet every 
address book contained this, as well as other unnecessary 
information. This is probably due in part to the fact that 
address books are not always personal databases, sometimes 
they are family documents. Appointment calendars appeared 
to be the tersest of all the personal databases studied. An. 
example entry for March 10 might be, "Rebecca 11:30" which 
is areminder that Rebecca haS an appointment with Dr. 
Feeney at the Pediatric Group, 698 Cass Street, 11:30 A.M., 
On March 10¢h. 

In order to establish a common ground for comparison, 
the follcwing terms will be used throughout this thesis. 


e Petsonal Database Management Syst2n (PDBMS): .a computer 
based systen for Managing persdénal Information. The 
information managed by hls system is organized into 
files containing records. 

e Manual Database MDB) ; a eee ad maintained file of 
personal information. Becaus3 hese databases are 
normally not systematicall managed as a group, there is 


no MDBMS analogous to a PDBMS. E£ach MDB is separate and 
distinct from all other MDBs; an address book, appoint- 
ment book, etc., are 2ach MDBs. 


ere $ a relationship between records. An MDB isa 
rile. All records in a file ars of the same format and 
related by the their grcuping into the same file. 


16 





e Record: an entry in a file... In an, addr2ss book each 
Tine a Peace Or an organization is added to th 
"address book file," a2 new record is added. 

e Field: an entry in a record. In_ general, all records 
in the same file have the same fields. (and Ehus struc 
ture). In an address book, the fields are. usually 
eq. lied "nape," "street," "city, State, and zip code, 
and "telephone number." 


Be. GENERAL CHARACTERISTICS 


As stated before, people do not generally view personal 
data as a database in the same sense as information ina 
computerized database. Each MDB tends to be viewed asa 
distinct entity, unrelated to any other MDB. Thus there is 
no notion of a database management system (DBMS) since «he 
MDBS are not managed together as a group. AS a result there 
is often redundant information in MDBs when they are viewed 
as a group. For example an address book and an appointment 
calendar probably both contain redundant information about 
an individual's insurance agent, realtor, @entist, sete: 
Even though the possibility for joins and Cartesian products 
exists, they are not only not performed, but the concepts 
behind these operations are apparently incomprehensible to 
the layman. 

The existence of separate MDB's or files can be intui- 
tively explained by three reasons. Pinst, and most 
Obviously, is that the amount of effort required to maintain 
even a partially integrated databas@2 manually costs more 
than the value gained by having such a database. 
Maintaining such a database requires the establishment of 
all possible desired relationships before the implementation 
of the database followed by the maintenance of complicated 
and troublesome cross-indexes. Less effort is required to 
check one's appointment book for appdsintments and then go to 
one's address book to obtain the phone number to call in 
Order to confirm an appointment; or if the requirement fora 
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confirmation was foreseen, to simply duplicate the phone 
number in the appointment book. 

The second reason is nore subtle and might be related to 
the ideas expressed in reference [3]. Even though the same 
entity (person, organization, etc.) may be included in more 
than one file, the different occurrences may represent 
different views of that same entity; that is, file entries 
are context-sensitive. d@hen comparing address book records 
to appointment calendar records, it is very common to find 
that the address book entry for an individual is more formal 
than an appointment book entry for the same individual. For 
example "Richard Elton" might appear as “Richard and May 
Elton" in an address book, "Rich" in an appointmen* book, 
ema "Lt. Elton" in a personal note. This context-sensitive 
hature of entries seems to indicate that integrating 32 
personal database is much more difficult than in the case of 
traditional DBMSs. 

The last reason is that inconsistencies between personal 
MDBs (i-e., files) due to replication (redundancy) of data 
is easily managed. This is not only because of the indi- 
vidual and aggregate fil2 sizes, but also because of the 
Mature of the data. The issue of size is obvious; the 
important characteristic of the data which aids in soiving 
the problems of inconsistency is that the keys used for 
access are closely related, if not identical, to cues used 
to reconstruct semantic structures. For example, when a 
person receives a change to his friend Pat's phone number, 
it will probably prompt him to make a change in his 
address/phone book. Woet Changed was not the entity "Pat" 
but just a value of one of the entity's attributes. Se) ste hg 
the most part, the cues (which are context-free) associated 
with "pat" remain unchanged. Ther= is a good possibility 
that ail occurrences of the old phone number will not be 


updated. Later when he comes across an occurrence of the 
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old number, it will elicit many of the same cues related to 
"Pat" as would the address book entry. Chances are that he 
will remember that the number was changed and was recorded 
in his address/phone book. Powe be then tnat the incon- 
sistency will be corrected, if it is at all. Perhaps people 
rely upon this and intentionally io not make any great 


effort +o seek out inconsistencies. 
fie) fa les 


Manually maintained files are apparently organized 
in two ways: sequential access and direct-keyed access. 
MDBs which are direct-key accessed are normally recorded in 
a commercially procured file or document. Examples of these 
files are address books which are designed to be keyed on 
mies Tirst letter of 32 surname in the "name" field or 
appointment books which are designed to be keyed on a date. 
Sequentially maintained files are commonly kept on less 
Tigidly structured media such as notéepads, chalk boards, or 
Scraps of paper. Information is usually entered chronologi- 
ean ly. SlcporoGmmels ses,  things=-to-do ilists, etc., are 
examples of sequentially organized files. + Another distinc- 
tion between the two file types is the time-value of the 
information stored in then. Indexed files usually contain 
information which is to be retained for a longer period of 
time than that contained in séquential files. Ic Wess 20t 
uncommon to find address book entries which were more than 
ten years old. 


2. Records 


With the exception of personal notes, records within 
any particular file tended to be fairly uniformly formatted. 
There is generally a core of fields which contain a value in 
almost all records. However many records contained addi- 
tional fields beyond the "“core-fields." In the case of 
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address books these fields were inserted into the pre- 
printed record formats by writing them vertically, placing 


them in an unused, unrelated field, or placing them into 


another record. The "core-fields" in address books are: 
Seema,” “street ,"“ "city," "state," "zip code,” “area code," 
and "telephone exchange and number." Typical additional 


fields contain information such as: 


mencecount, Wodel, Serial, Policy, and Social Security 
Numbers. 

ee ddaitional Phone Numbers (ée.q., "home," ‘work, " 
Bee tans department," "service, RAccoun: LaQuwe tes, 
ercC. ) « 


e Birthdays and Anniversaries. 


® Additional Names eeguese childrsn*®s ‘names, points of 
@oneact) . 


e Cards and Favors Sent and Received. 


e Additional Miscellaneous Information (e.g., "When in 


Beac-ele," "Neaghbors in "Monterey," or "Uncle Bob's 
brother-in-law"). 


In the case of address books, record d¢életion 
appears to be an unpredictable event and probably a function 
of the medium upon which it is recorded. Bound address 
books contain many more entries whos2 validity are question- 
able, Many of these appear to be retained not only because 
they were entered in ink, tnereby making deletion a messy 
affair, but for sentingental reasons. Many of the very old 
entries are for high school and childhood friends. Address 
books which permit easy deletion of records appear to 
contain fewer old entries, but because deletions are not 
recorded it is not easy to attribute this effect to the ease 
of deletions. 
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3. Fields 


Even though the fields!’ types and numbers appear to 
be fairly standardized, the contents of the fields is not. 
Fields appear to be variable length with no restriction on 
content. Graphic, non-alphanumeric symbols such as hearts, 
check-marks, and "happy faces" are not uncommon. Some files 
Senealin indicators of the validity of the information in the 
meid (e.g., "?" or “*as of Dec 81"). Abbreviations are not 
consistently used in the same file; for example, one address 


book examined contained all of the following entries: 


Street ot e She ig 

Avenue Ave. 

Virginia ViEgs Va VA 
We. & Mrs. pag joni Poe and ates . 


C. DESIGN IMPLICATIONS 


It appears obvious that a PDBMS and a DBMS are not the 
Same. AS such, it is reasonable to construct a PDBMS 
Gmererently from a DBMS. Because a PDBMS is us2d as an aid 
to recall contexts from memory, and the cues to these are 
unique to each context [3], not only should the system have 
no restrictions such as fixed field lengths and attribute 


values, but additionally it should: 


e Allow the user to use any word as a key. 
e Be able to recognize and compensate for misspelled keys. 


e Be able to take into account keys which are rey eee lege and 
refer to the same entity (poz ete os see the SS oe 
tion of fields, above). Also it should have the ability 
to discriminate between homonyms which appear to be the 
Same but refer to different attributes or entities (for 
example, Le osmeonwcusnewT2af TON for “Court” in a 
street address versus "CT," as an abbreviation for 
meonnectaicut"). 
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When interviewing laymen, it was found that they easily 
Marerstand the concepts of "file™® and "record," but not 
"field." This suggests that perhaps people conceptualize an 
entity as a synergistic sum of its attributes rather than as 
a relationship between attributes. Thus a record is the 
smallest logical unit with which people normally deal 
because it, as a whole, contains the cues necessary to 
reconstruct semantic structures. Tha number of fields ina 
record may be related to an individual's ability to "“inte- 
grate" the corresponding semantic structure [ 3}. 

Because a PDBMS is an aid to an individual's recall, it 
should faithfully preserve information entered and retrieve 
it by logical means. If text compression or compaction? is 
employed it must be transparent ¢9 the user. Logical 
retrieval means that if the user feels that he has given 
sufficient information to specify the desired data, the 
system should be able to either retrieve the data or give a 
comprehensible reason why it could not be retrieved. 

A PDBMS should be “user friendly" and require very 
Miecle effort on the part of the user. This means that 
persons who have no need or desire to understand computers, 
DBMSs, etc., should be able to use the system. PGi the ine 
file, record, and field formats should be easily specified 
without the need for a plethora of technical details. Entry 
and retrieval of data should also be fast and @asy. Most 
people who are not specifically trained on computers tend to 
have much less tolerance for poorly engineered computer 
Systems or ones reguiring 2 technical expertise than do the 


-_— <b aD <a ae oe a a ee ee Se 


eText compression and compaction involve removing redun- 
dant information from text so that it can be_ storéd using 
fewer resources than if the original text had been stored: 
The difference between the two iS that an exact copy of the 
Original text is recoverable after comoression, whereas it 
memnot from compaction. 





system's designers ocr computer scientists [4]. Above all, a 
computerized system must be better in every way than the 


corresponding manual system [1]. 
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A. SOFTWARE 


When the user first receives the PDBMS, he sees only two 
functions: a calculator anda database management systen. 
As the user learns how the system works, it is possible for 
him to expand the system incrementally until eventually he 
can reprogram a large portion of the system itself in FORTH 
and/for assembiy language. 

Many of the keys on the PDBMS's keyboard are program 
mable. They are initially used to allow the user to enter 
commands by simply pushing a key. Ins=ecaa “ot geyvping 
"RECORD" when using the database nanagement function, the 
user needs only to push the "SHIFT" and "R" keys and the 
system will enter the word “RECORD" for hin. 


1. The Calculator Function 


The calculator which the user initially receives is 
much like any other calculator. lwo major ways in which 
this function differs from most standard calculators is that 
aseries of arithmetic operations may be entered at once, 
and that the user may create and use variables. Unlike most 
calculators, the action of most of the keys on the PDBMS is 
Simply to enter textual data ints the systen. The PDB8MS 
dees not interpret most of the input until the ENTER key is 
pressed. So the following two key sequences have the same 
eeeect, i1.€,., to add two to three and obtain five. 
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2 2 
<enter> <space> 
+ + 
<enter> <space> 
3 Ss 
<enter> <space> 
<enter> <enter> 


Like in FORTRAN, variables are created when they are 
erst used. If a word or a character is found in the input 
which the calculator cannot recognize and it is to the right 
of an equal sign, it assumes that it is a variable declara- 
tion and creates one. If an unrecognizable word or 
character 1s encountered to the left of an equal sign, an 
Omer condition is signalled. 


2. The Database Management Function 


The database management function allows the user to 
create files and records, delete files and records, retrieve 
records, and use keys (i.2., passworis) to seal records and 
other keys as a means of providing data security. The user 
is not required to deal directly with the technicalities of 
database data structures, he only needs to know that files 
eee a collection of records, all having the same format. 
Files appear to the user to be separate and disjointed, 
Similar to MDBs. The procedure for creating a file requires 
only that the user specify the file's name and the nanes of 
the fields within the records of the file. The user is led 
through the process of file creation and record retrieval by 
system prompts. 

Records may be retrieved by using any word (or group 
of words) contained within then. Themonmlyatestr.ction on 
this is that the user must specify which field is to be 
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searched for the target word(s). iat S wEestruccion Should 
not seem unnatural to the user but, rather, necessary. 
Because any word is a possible key attribute, the user must 
be able to specify the context of the target word. By spec- 
ifying the field name with queries, the user is able to 
retrieve a record using Mr. York's last name without also 


retrieving all of the records containing "New York." 


Be DATA STRUCTURES 


The PDBMS uses some data structures which might be 
considered unusual when compared to other database applica- 
tions. Some of these are characteristic of FORTH and others 
are used because of the nature of the systen. 


fe Dictionaries 


Two different dictionary structures are used in the 
PDBMS. One dictionary is that which is associated with 
FORTH. The second is conceptually more like a dictionary, 
as a layman might think. RP COntt a2 Cctlonary 157 Simply a 
linked list of FORTH definitions. The definitions are main- 
tained in chronological order by their time 9£f creation. 
These definitions typically describe the following basic 
FORTH word-types: colon definitions, constants, variables, 
user variables, and vocabularies. Colon definitions are 
FORTH definitions which are defined in terms of previously 
created definitions, Similar to procedures and functions in 
other languages. Vocabularies are "“"sub-dictionaries" and 
are used to delimit the scope of definitions. 

The other dictionary is called the DB dictionary and 
it is used to store the words entered and contained in the 
database. Words are entered into the dictionary and 
looked-up by hashing to a linked list using the first letter 
Or digit of the target word, and then traversing the list, 
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which is alphabetically sequenced. BURGeUade non 1S not 
Bpeered in the DB dictionary. 


Z. Files 


Files are completely inverted. They contain only 
administrative data, and indices and pointers into the DB 
g@ectionary. Information which is retrieved from tha data- 
base is reconstructed a word at atime by looking words up 
in the dictionary (punctuation is stored directly in the 
database in its ASCII format). Memory for files, the DB 
dictionary, and sealed keys (discussed later) are allocated 
from a heap so that none of these data structures occupy 
contiguous memory. A file is defined as a FORTH vocabulary 
and its definition contains pointers to the first and last 


records in the file. Records are maintained as a circular, 
doubly linked list. The flelds are defined as FORTH 
constants in their respective file's vocabulary. Their 


value is an ID number which is used to relate the fields in 


the database to the names assigned to them by the user. 


Bae Logical Records 


SS SS 2 ee ee ee Esa Ee a ae 


To the user a record appears to be a collection of 
information related to a particular antity. The fields help 
Bemorgan:ze the data by grouping it. The logical record 
meself is variable in length. The first set of bytes in a 
Meeeord contain the record's access descriptor, which is 
variable in iength. Ths 265 foltowed by the liarks (or 
pointers) to the previous and next records in the file. 
Following these pointers are the fields which are fixed in 


Mumber (as determined in the file's definition), Dut are 
each variable in length. Fields are séparated by an 
end-of-field (EOF) marker. Because records contain a fixed 


Mumber of fields, the last EOF serves as a end-of-record 
marker. 
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4. Fields 


Fields are a continuous string of bytes which repre- 


sent the data contained in the field. Punctuation appears 
mieitsS ASCII format (one character per byte). Words are 
represented by two bytes, thewtirst contains the word's 


initial letter (or digit) which is used to hash into the DB 
fret ionary. the second byte is a aumber used to identify 
the particular member of the linked list hashed ‘to repre- 


senting the target word. 
9. Keys 


Keys may be thought of as passwords which are used 
to secure records, FORTH screens, and other keys (called 
sealed keys). These objects (1.e., records, screens, and 
keys) all have access descriptor fields which contain infor- 
Mation about what keyS ar necessary to access the 
particular object. Keys allow the user to construct fairly 


complex access mechanisms. 


C. HARDWARE 


Figure 3.1 is a simple picture of the iayout of the 
PDBMS‘'s hardware. The system makes extensive use of CMOS 
technology so that it can be battery driven. There are six 


Major components in the systen. 


1. Erasable Programmable Read 


—— 
a ED Co coe a Sa ee =a a a = == 


Erasable programmable read-only memory (EPROM) occu- 
pies the system's low memory and contains the PDBMS's 
operating systen. There are 16K bytes of EPROM in the 
System. AS its name implies, its contents cannot be altered 
by the user. 
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Figure 3.1 PDBMS Hardware Configuration. 
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Random access memory (RAM) is used by the user as 
his workspace. System parameters and data structures which 
change according to the runtime environment are also main- 
tained in RAM. There are 16K bytes of RAM. 


rasable Programmable Read-Only Memory 


3. Electrically 


Electrically erasable programmable read-only memory 
(EEPROM or E2PROM) Serves as the system's secondary storage. 
The unique characteristic of E2PROM is that it can be erased 
fee. , written into) under software control, as RAM can, but 
it is non-volatile (i.e., 1*#sS contents are not lost when the 
power is turned off). Part of the E2PROM is not accessable 
to the user because itis uséd by the system for EePROM 
memory management, and database aanagement and storage. 
What is not used by the system is available to the user as 
FORTH screens. 


Pee L2qguida crystal display (LCD) serves as the 
system's console. Eieecontal as tWO GOows Of 20 chazacters. 
tis attached directly to the system's bus and any data 
written into memory beginning at address COQ00H appears on 
mae, LCD. The keyboard provides the means by which the user 
can directly input data into the system. I+ is connected to 


#he system's bus via a parallel I/O port. 
9- Centra 


The PDBMS uses an NSC800 microprocessor operating at 
pmeeock rate of 1 AZ. Pais asc) a e4Os MiCEOOEOCeESSOr which 
is downwardly compatible with the 280. It was chosen as the 
system's CP0 because of its low power consumption and the 
availability of software. The slow speed is not an issue 
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with this system because of the 


human-computer communications. 


This port allows the user 


with other systems. 
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A. CONVENTIONS AND NOTATION 


The nature of words in FORTH does not lend them to be 
referred to by enclosing them in quotes, so instead they 
will appear in upper-case boldface. However, because 
boldface punctuation is often hard to distinguish from 
standard text punctuation, the following eight FORTH words 


will be enclosed in braces: 
e @ g ’ 


Additionally FORTH words composed entirely of strings of 
these characters will be enclosed in braces (for example, 
tee })- 

Finally, to avoid ambiguity, the following conventions 
Will be used when using the three words "key," "word," and 
"dictionary." When there is a possibility of confusing the 
FORTH meaning of "word" (described below) and the accepted 
computer term "word" (i1.e., two bytes or 16 bits on the 8080 
and Z80 microcomputers), the former "word" will be called a 
"word" or a "FORTH word," whereas the latter "word" will not 
be used, instead "two bytes" will be used. Adding further 
possibilities for confusion is the third meaning of “word." 
Mees third meaning is the usual English connotation of 
“word” and these “words” are data in the PDBMS. The ubigui- 
tous FORTH response, "OK," and words entered by the user as 
responses to the system prompts and as data to be included 
into the database are "words" in this third class. Data 
words of this type will b2 called “uwords." Because uwords 
entered into the database may be alitered before they are 
entered into the database dictionary, the words which reside 
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TABLE I 
BNF Definition of Uword and Wordd 


@ _ oa 
| uword ::= <wordd><punctuation>|<punctuation> | 
punctuation ::= 1-1/1 * l=tl-l<space> Is, (]) [2]... etc. | 
Space ::= 20H | 
wordd ::= <wordd><char>|<chard | 
char ::= VES ole lero S| OATS 2... [XI Yi Z 
oe, a ee ae 


in the database dictionary will be referred to as "wordds." 
Table I shows the BNF definitions of both uword and wordd. 

In order to distinguish between a "key" on the keyboard 
and a "Key" which is used as a password to SEAL and UNSEAL 
data objects, the latter "Key" will always begin with a 
capital "“K." Finally, because many of the syst2m data 
Structures are not only maintained as FORTH dictionaries 
(also referred to as vocabularies), but wordds are stored in 
Meoaea Structure which is not a FORTH dictionary but which 
May also be rightfully called a dictionary, the following 
convention will be followed. When the possibility of ambi- 
guity may exist, the dictionary being referred to will be 
prefaced by its name (€.g., root dictionary, DB dictionary, 
etc.). 


Be PHYSICAL MEMORY AND I/O PORTS 
1. ardware and I/O Ports 


Physical memory is that anemory in which FORTH 
programs execute. This memory lies entirely within the 


user's address space. The PDBMS's physical memory consists 
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Srea little more than 32K bytes (see Figure 4.1). The lower 
memory (O000H to 3FFFH) is EPROM, and the high memory (4000H 
to 7TFFFH) is RAM. Additionally there are 256 bytes of 
memory located at addresses CO0OH through COFFH; the first 
40 bytes of these 256 bytes represent the 2 lines of 20 
characters on the liguid crystal display (LCD). The 
contents of these memory locations are interpreted as ASCII 
encoded data and are mirrored on th2 LCD. Thus the LCD is 
directly addressable via the system's bus. Finally, memory 
locations FFOOH to FFFFH comprise the virtual E2PROM window. 
When a segment is accessed from E2PROM by writing its 
segment number to the segment register and "powering up" the 
E2PROM, it appears at these addresses and may be read from 
and written to. When E®PROM power is off these addresses 
are invalid. 

There are two ports which are directly associated 
with the user's address space and accessible to hin. One 
port is a read-only port used to receive data fron the 
keyboard (it is envisioned that the keyboard will eventually 
be tied directly to the system's bus). This port is located 
oer DH. Maemeormer corte 25 a UART port configured for an 
RS232 serial interface and is located at FAH. 

Finally three locations are set aside as juap 
vectors. These are predetermined by the NSC800 hardware in 
Snterrupt mode 1 which nimics the 280. The cold boot vector 
is located at OOH. The non-maskable interrupt (NMI) jump 
vector is found at 66H. This interrupt is generated by two 
conditions: whenever the systen is “turned off" by «he user 
and whenever the system is reset (via the reset button). 
Because of the slow nature of the E2DROM, it may be possible 
for the user to turn the power off or reset the systen 
before a write-cycle involving a large block of data has 
been completed. The virtual memory manager is the ultinate 


recipient of NMIs. Upon receiving one, it waits for the 
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Figure 4.1 PDBMS Physical Memory Map. 
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write-cycle to be completed and then sets bits 1, 0, and 4 


Sethe control port accordingly. AL tertecerng ecuae05 a JUDD 
to warm boot is executed. Setting bit & to one when the 
power switch is in the on position has no effect, so the 


Same interrupt handling routine correctly handles both 
interrupt sources. Ten seconds after an NMI generated by the 
power-off condition, the hardware automatically shuts itself 
Mer, if it is still on at that tine. Tho jeneed eeocat lon 2s 
38H which contains the maskable interrupt (MI) vector. Both 
the keyboard and E2PROM generate interrupts which vector 
here; the device requiring service is determined by reading 


the status register (described below). 


Figure 4.1 shows the allocation of physical memory 
to data structures in the PDBMS. It varies from the config- 
uration in Figure A.1only in that it has data buffers and 
pointer buffers. These buffers share memory with the buffer 
blocks. Block and data buffers are not used concurrently so 
they do not occupy the buffer area at the same time3. The 
data buffers are used for encoding and decoding individual 
database records. Records are read into the buffers as they 
appear in E2PROM (less key ID numbers and administrative 
pointers) and then are decoded into their ASCII representa- 
myanm which is placed into the current record buffer and the 
Beep window. Probably only a portion of the record fits into 
mae 40 character LCD. The first two bytes of each data 
buffer contain the resident record's virtual pointer (FFFFH 
indicates an empty buffer). 


eo Pen? ona EP op oe Ee a ae ae ee ae oe ce 


_ %Even if the PDBMS is designed so that it LOADs defini- 
tions from screens during. execution of database operations, 
there 1s no problen. 15s is because the block buffers are 
not used during a LOAD; the E2PROM is simply read directly 
without using a buffer. 
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The pointer buffers serve several purposes. During 
retrieval operations buffer number one holds the pointers to 
records to which the user is authorized access and which 
have satisfied all guery conditions processed so far. The 
second buffer holds pointers to records to which the user is 
authorized access and which satisfy the current query condi- 
tion being processed. After the completion of the 
processing of each query condition the intersection or union 
of the two buffers (depending upon the query) of the two 
buffers is placed into buffer one. 


C. VIRTUAL MEMORY AND CONTROL PORTS 
1. Hardware 


In the PDBMS, E2PROM is used as secondary storage. 
A total of 8K bytes of E2PROM is included and it is 
segmented into 32 segments, each 256 bytes in size. 
Segments (analogous to FORTH blocks) aee, EUre her divided 
maemo physical records 16 bytes in size. Figure 4.2 shows 
the bus interface of the Intel 2815 E2PROM chips. As in 
Standard PORTH, the user and user programs deal with phys- 
ical addresses only. The user can only refer to virtual 
memory by uSing screen numbers. However, some PDBMS words 
use two byte virtual addresses to access physical records in 
Virtual memory. Only assembly language coded words 
("low~level" words) can directly fetch and store bytes in 
E°®PROM via the window. 

PDBMS virtual addresses consist cf two bytes. One 
byte contains a segment number and the other a physical 
record number within the segment. Because only four bits 
are needed to designate a physical record, if it were tech- 
Nhically feasible the system could accommodate 512K bytes of 
E*PDROM. 


a 








Ceta Bus 





EEPROM 
Controller 







Figure 4.2 2816 E2PROM Configuration. 
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Only 15 of the 16 bits are used for virtual 
addresses. Theomenigiwoatem(ce: 7 Of the Most Significant 
Byte——-MSB) is used to differentiate virtual from physical 
addresses in E2PROM and RAM. Virtual addresses which move 
from E2PROM to RAM and vice versa must pass through low 
level FORTH words which ensure RAM and E2PROM virtual 
addresses never get mixed in with each other. E2PROM 
virtual addresses have their high bit set to zero while RAM 
virtual addresses have their high bit set to one. Thus 
Virtual addresses appear to be out-of-range references 
Within the domain in which they occur. For example, 1£ an 
address referenced inside an E?PROM segment is less ‘than 
8000H, then itis avirtual address to another segment. 
Intra-~segnent addresses are always greater than or equal to 
FFOOH (all of which have a high bit of one). This means 
that, as in standard FORTH, "orograms" cannot be executed 
directly from secondary storage but must be LOADed first. 
This allows all code field addresses (CFA) to be interpreted 
as physical addresses, whether they occur in RAM, EPROM, or 
E2PROM, so there is no problem associated with storing 
constants and variables in E2PROM. Care must be exercised 
to ensure that LCD window addresses are never used in the 
Same RAM context as RAM virtual addr2sses since they would 
be indistinguishable from each other. 

The E2PROM can be read in 450 usec, however it 
requires 20 msec* to writa one byt? (all of the bytes on 
each chip may be erased in one 10 msec operation). 
Additionally the 2816 must be strobad with a 21 volt pulse 


during the write process. This means that E2PROM cannot be 


—_- =e «ee a om a oe a eee a ee ao a ae aD 


*Intel literature states that their E2PROMm requires 10 
msec per write, which is true. However, in order to ensure 
that the data is properly recorded, the addressed byte 
mou ad COntain FFH berore it is written into if a write 
requires a zeroed bit to de changed to one. Thus writing 
mevolves two write operations: one t9 set the target byte to 
FFH, anda second to write the desiréd value. 
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treated the same as RAM. ther non-volatile nemories were 
considered for this design, such as NOVRAM and Instant ROM. 
Both of these alternatives can be treated almost as if they 
were RAM, however they were judged unsuitable. NOVRAM was 
not found to be a feasibl2 choice because of its small size. 
The largest NOVRAM chip contains only 256 bytes, thus 8K of 
NOVRAM cannot be battery powered because of the large numbez 
of chips that would be required. Instant ROM was also found 
to be undesirable because it contains its own battery power. 
The on-chip battery is guaranteed for three years, and this 
is hardly suitable for 2 permanent database. Gueme ne Ly, 
available hand-held computers use concepts similar to 
Instant ROM, they use CMOS memories which are constantly 
refreshed, even when they are turned "off." 

The E2PROM and the PDBMS is controlled through three 
Sontrol perts. One port, the segment register, is used to 
select the desired segment. This port is located at F8H and 
is write-only. The second port is the status register. Ec 
is located at FIH and it is read-only; it reflects the 
System's current status. Figure 4.3 shows the status port's 
configuration. Complementing the status register is the 
control register which is a write-only port located at F9H. 
The contrel register is used to effect system changes. This 
port is described in Figure 4.4. These ports, as well as 
Sees Other ports, axze “smart ports in that they only accept 
instructions from code being executed from EPROM. Tt does 
this by checking the program counter which the NCS800 places 


on the address bus prior to fetching an opcode fetch. LE 
the A15 and/or A14% lines of the address bus are high the 
next instruction is ignored. E°PROM power and write-power 


meee turned on and off by setting bits 0 and 1 accordingly. 
Whenever either of these bits is set to one, bit 7 of the 
Status register is set to zero. After the chins have been 


powered-up, bit 7 of the status register is set to one, so 
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Flag Mean{ngs Boot-up Values 


1: EEPROM reacy 


1 
8: EEPROM not ready 
lz: EEPROM wefte-power {fe on 

0 
O: EEPROM wrefte-power {fe off 
lz EEPROM Interrupt sending 

0 
@: No EEPROM {nterrupt pending 
Not used n/a 
Not used n/a 
1s: Keydoerd {nterruot sending 
0: Mo keyboerd [nterrust pending 
ls UART recefver ready 

n/a 
9: UART recefver not reacy 
13 UART transaftter reedy 

n/a 


03: UART tranem{tter not reedy 


@ 
omg 
er 
-—~ J Vv) 


Figure 4.3 Status Port Flags (Te 9FH). 
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memptt oOo Of 5 (depending upon whether bit 0 or 1 of the 
control register had been set). additionally, whenever bit 
7 is set to one (except during a cold boot of tne systen), 
an MI is generated. When bit 7 of the control register is 
set to one, bit 7 of the status register goes to zero. When 
the E2PROM write-cycle has been completed, bit 7 goes high 
and an MI is generated. 

Changes in bits 0 and 1 of the status register do 
not generate interrupts, but when bit 2 goes high (indi- 
cating keyboard input) an MI is generated. Reading the 
status register resets bit 2 to zero. 

Notice from Figure 4.2 that the four 2816 chips are 


interleaved so that all addresses equai to zero, mod four, 


meenon the first chip (i.é¢., those addresses whose last 
Mexadecimal digits are 0, 4, 8, or C). Those equal to one, 
mod four, are on the second chip, etc. This arrangement 


Facilitates fast writing of blocks of data to E2PROM because 
four contiguous bytes may be written Simultaneously. Thus 
in the best case (when four contiguous bytes are written) 
the average write-time per byte is approximately 5 msec and 
an entire segment can be written in 1.25 seconds. Actually 
more time is required, but the additional time 1s minor when 
compared to the gross nature of the E®2PROM write-tine. The 
additional time involves reading and comparing the contents 
of the E2PROM to the appropriate buffer's contents (data or 
mock buffer). The entire write-cycle algorithm is shown in 
Meaple IT. 


The 8K bytes of E@PROM are divided into two types of 
segments: system segments and lock (or screen) segments. 
System segments are owned by the system and cannot be 
directly accessed by the user or his programs. Block 


segments are those which contain screens, in the usual FORTH 
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Bits 81t Set Meanfngs 


Ls: Stert EEPROM write-cycie 


03: No effect 


Not used 


is: Turn system off (EEPROM must be off first) 


69: No effect 


Not used 


Mot Veed 


lz: Turn EEPROM weite-voltage on 


@: Turn EEPROM write-cycie voltage off 


ls Turn EEPROM cower supply on 


93 Turn EEPROM power suoply off 


pe fede fedefe fe : 


Figure 4.4 Control Port Flags (OUT 9FH). 
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TABLE II 
Virtual Memory Write-cycle Algorithn 


| 


J = START_OF_SEGMENT; 
RePees UNTeG NO MORE BYTES; 
DO I = J TO J+3; 
READ E2PROM_BYTE(I); 
IF BUFFER_BYTE(I) # E2PROM_BYTE(I) THEN DO; 
IF BUFFER _BYTE(I) & E®PROM_BYTE(I) #°0 THEN 
E°PROM BYTE(I) = FFH; 
B@PROM BYTE(I) = BUrPER_BYTE (I); 
END DO; 
END DO; 
SONTROLEPORTSODTS (7) = 1; 
LOW POWER HALT; /? WAIT FOR INTERRUPT 2/ 
DO I = J TO J+t3; 
READ E*PROM_ BYTE(I); 
IF BUFFER _BYTE(I) # E2PROM_BYTE(I) THEN 
SIGNAL (E2PROM_WRITE_ERROR) ; 


a 2. ee 


cee Se ee ee 


END DO; 
JI=J+ 4; 
END REPEAT; 
sense, and are available *t9. the user. Blocks are allocated 


sequentially in a round-robin fashion by the nemory manager. 
This means that the next segment to be allocated is the next 
higher unallocated segment after the last allocated segment. 
When the 32nd segment is reached, allocation begins again 
from the first segment not initially assigned to the system 
(i.e., when the software was placed into the system). ais 
scheme is used in an attempt to more unifornly distribute 
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the E2PROM use. If a "lowest available segment algorithm" 
were used, there would be a higher probability that portion 
of E2PROM asSigned to the low numbered segments might "burn 
out" (E2PROM is limited to 10,000 write operations to each 
individual byte). 


ae System Segments 


System segments are those which are used by the 
PDBMS for virtual memory management data structures and the 
database. The user cannot directly access these segments 
because any segment allocated to the system is not placed in 
the block number dictionary. System routines address these 
segments directly (i.e., they “know” the physical segmen* 
Numbers whereas the user knows only virtual block or screen 
numbers). At least four segments are dedicated to the 
system; the system and the user conpete for the remaining 
segments (less system message screens) which are allocated 
on a first-come, first-serve basis. Additional system 
segments (beyond the dedicated four) are used t9 accommodate 
the expanding database. Because the database resides in 
system segments, the user cannot see their physical struc- 
ture; he is limited to viewing it through the PDBMS. The 


first four segments are structured as described below. 


(1). Parameter Table. This segment contains a 
collection of system parameters and tables. ror example, 


most of the cold boot paraneters are loaded from here. Also 
located here is the vocabulary table. 

(eee kev Sub-D ictiOneamy. Security in the PDEMS 
1s provided in part by Keys. These Keys are used to seal 
records, blocks, and other Keys. These Keys are maintained 
in a linked list dictionary as a separate VOCABULARY. The 
Key vocabulary definition is located in EPROM. The code 
pointer of each Key points to the run-time code for CONSTANT 


which is located at docon. Thus when the Key is exacnted, 
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it returns the contents of its two byte parameter field 
address (PFA). The value held in th2 PFA may have two mean- 
ings. If the value returned is less than 128, then it is 
the Key's identification number (ID). If it is greater than 
128, then the value returned is a virtual pointer to a 
sealed record containing the Key's ID number. The Key ID 
value, FFH is reserved for the null Key, while the value OOH 
is reserved for the system's Key. Also the value FEH is 
used as a substitute ID for the ID value of deleted Keys! 
IDs in access descriptors. The use of Keys is discussed in 
greater detail in Chapter VI. The Key vocabulary, besides 
containing Keys, contains words; these words are stored in 
EPROM. 

(3)- Block Number Dictionary. The segment 
containing this is divided into three parts. Four bytes are 
set aside as the segment allocation table, four bytes sare 
used as the segment allocation sequencer table, and the rest 
of the segment is used as a vocabulary for virtual block 
numbers. Each bit in the segment allocation table repre- 
sents a segment. If a pit 1s set to one, the corresponding 
segment has been allocated. The sequencer table has cnly 
one bit set, the one corresponding t9 the last segment allo- 
cated. 

The virtual block numbers are maintained 
as a FORTH vocabulary, as are the Keys. Also like the Key 
vocabulary, the definition of the block number vocabulary is 
located in EPROM. However, unlike the Keys, virtual block 
numbers are fixed length name, one byte constants. This 
allows virtual numbers +9 be assigned to all of the origi- 
hally unallocated segments. This limits block numbers to 
four characters in length. THeSed1ctionary is Static and 
always contains 28 entries. Entries are removed from the 
dictionary by blanking out their virtual number (i.e., the 
entry's name field) and setting the smudge bit so they will 
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not be found. When a virtual block number is entered by the 
user, the entire dictionary is searched. Por example the 
following keyboard entries would trigger searches of the 


dictionary for "1" and "25" respectively. 


fe LIst 
Z> LOAD 


If "1" had not been found in the dictionary a block buffer 
(located in physical memory) would have been allocated to 
Martual block "1." The virtual number "1" would not be 
entered into the block number dictionary until it was 
written to E2PROM. If "25" had not been found the usual 
FORTH error condition would have been raised. 

(4). The Database Segment. This block is 
broken into two parts. The first contains a jump table into 
mae) DB dictionary. There is one jump vector for each prin- 
table ASCII character allowed by the system (a maxinum of 
64). A character's jump vector is hashed to using the 
following equation on the charactar's hexidecimal value 
Meatied "char"). 


Location of jump vector = 
(ie pad eo 2H) FPP) te FE O0H 


If the vector is equal to zero, then the character is punc- 
tuation (as described in Table f[). Punctuation is not 
Seared in the DB dictionary. If the vector is equal to 
PRPFFH (uninitialized E2PROM), then there are currently no 
wordds in the dictionary starting with thet “Letter. 
Otherwise the vector is the virtual address of the first 
physical record inan alphabetical linked list of wordds 
beginning with that letter. The next four bytes of this 
segment contain a bit map of the segments. Lika the segment 
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eiepocation table, a bit is set to one if the corresponding 
segment belongs to the database. 

The second half of this database segnent 
is used for the beginning of the file and field name vocabu- 
lary. Field entries are simply FORTH constants which return 
their field ID number (0 to 255). File entries are nodified 
FORTH vocabulary definitions (they contain five extra bytes 
used to store pointers to the first and last records in the 
Bube, and a field count). The field names are entries into 
the "file vocabulary" to which they belong. This allows 
FORGET to be used to delet? files. Of course FORGET is not 
sufficient by itself; the virtual aemory allocated to the 
forgotten entries must be turned back to the systen. 
Because of the nature of record entries in the PDBMS, fields 
Cannot be individually forgotten. AS with the Key vocabu- 
lary, the file vocabulary definition, as well as some other 
words, reside in EPROM. 

When information is added to the database, 
it expands in three ways. First the file and field vocabu- 
lary grows to accommodat2 new fila and field definitions. 
This dictionary May spill Lato, wacdilt zonal segments. 
Allowing this dictionary to exist in nore than one segment 


creates some problems which must be specifically addressed 


by the interpreter/compiler. Off-segment references can 
only address 16-bit physical records, so entries of this 
type cannot be positioned in a “fornat-free" manner. Thus 


entries in this vocabulary are all placed in memory taking 
Maes Ohysical record into consideration (i.¢e., beginning on 2a 
physical record boundary). A benefit of this is that «he 
entries may be mixed into the same segments with the D8 
entries, file logical records, and saaled Keys. 

The database itself may be considered a 
totally inverted file system. Records contain only PDBMS 


information and pointers to dictionary entries of wordds 
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which appear in the record. Figure 4.5 shows a typical 
entry in the PDBMS. The system knows how many fields are in 
the currently open file, so it uses the last field's 
end-of-field (EOF) as the end of record marker (EOR). The 
EOF is the same character as the null Key, making FFH (blank 
E2PROM) a general system end-of-data marker. When a logical 
record is broken over a physical record boundary, the last 
two bytes of the physical record contain a pointer to the 
next physical record. 

Fields are strings of ASCII characters 
followed by an entry ID number. The ASCII letters are the 
MietlLal letter of the wordds (1.e., transformed uwords) 
originally entered into the record by the user. The letters 
are used to hash to the jump vector table on the first 
segment of the database. DB dictionary entries are mnain- 
tained in an alphabetical linked list. Tie “cORrECet wOLdad 
corresponding to the uword entered into the record is found 
by matching the ID number following the letter used as input 
to the hash function to the ID number of a wordd on the 
femme d list hashed to. Punctuation is not followed by an ID 
mamber and the record decoding routines "know not to look 
for an ID number in the record because punctuation jump 
vectors are equal to zero. 

Figure 4.6 showS 2 typical dictionary 
entry. This structure is an expanded and modified version 
of the one used in Craig language translators [5]. The 
entries are designed to take advantage of the alphabetical 
hature of English language dictionaries. The first byte 
contains a zero and is ignored when traversing the DB 
Seecaonary during a wordd look-up. ier oeeObaced» there to 
prevent an accidental retrieval by non-dictionary routines 
which always treat the first byte as a Key. The second 
byte, the copy byte, contains the number of leading charac- 
ters in the current wordd which match the leading characters 
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FFH (End of Field) 
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Figure 4.5 Database Physical Record Structure. 
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in the previous wordd on the linked list. The link bytes 
contain a pointer to the next wordd in the linked list. The 
add byte contains a number, which when added to the 
"copy byte + 1" character of the previous wordd yields the 
Setrect "copy byte + 1" character of the current wordd. The 
bytes following the add byte contain the ASCII characters of 
the current wordd after the "copy byte + 1" character. The 
last character's high bit is set to one as an end of string 
delimiter. If there are no characters following the 
"copy byte + 1" character then the byte following ‘the add 
byte contains FFH (which translates to an ASCII delete). 
The wordd ID byte contains the wordd's ID number. Thesis 
used when decoding records. Figure 4.6 shows how the DB 
entries for "FORGET" and "FORTH" would appear if they were 
consecutive entries and "PORGET" was the first "PF wordd." 
mer Owing the last unique character isa linked list of 
field ID numbers with pointers to records containing the 
field asseciated with its corresponding field ID. These 
field numbers and pointers are used in retrieval operations. 
Records are retrieved by specifying field names and uwords. 
Obviously punctuation cannot be used for retrieval since 
only wordds are stored in the DB dictionary. 

Figure 4.7 shows how the dictionary is 
traversed to find the desired wordd. Uwords are reassembled 
in the PAD by making the changes indicated by the copy byte, 
add byte, and unique characters as the list is traversed. 
Maat 2S, when the DB dictionary linked list is entered, the 
Mest WOrdd in the list is copied out into the PAD. [If this 
is the not target wordd, then the second entry in the linked 
list is moved to. Using the information in the copy byte, 
the add byte, and the unigue charactars, the second wordd in 
the list is constructed. In moving from "FORGET" to "FORTH" 
as shown in Figure 4.6, "FORGET" would be written into the 
PAD as the first wordd in the linked list of "F wordds." 


31 





Add Wor Flel R 
L trv Untque Cnerecters ee hat slit \niaiaalie nes 
dyte 1d Id poltnter 


Typfcal 08 Ofctfonary Entry 





mnasn(F ) 


W Filei 
FORGET or dc lst @id| Recore _ 
ID Id pofnter 


>| 
3 l 3 fa ae a6 ae eee 
ga) pointer 


"FORGET" & *"FORTH™ as 0B Ofctionary Entries 


Pigure 4.6 Structute Of a DB Dictionary Zntry. 
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When the search continued past "FORSET" because it was not 
the target wordd, the first three letters in the PAD would 
be left because the copy byte of the second entry is 3. 
Then 13 would be added to the fourth letter (G) because that 
is the contents of the add byte. This would change the 
meee letter from ac" to a "T." Then the £ifth letter, 
and any subsequent ones, would be replaced by the the unigue 
characters (in this case "T"* would be overwritten with an 
mey. At this point the PRD contains the wordd "FORTH." 
Once awordd has been placed into the 
dictionary, its first physical record is never returned to 
the system to be reallocated. T= all instances of a wordd 
are removed from the database, the high bit of the copy byte 


is set to one. Subsequent searches of the dictionary will 
not "seel’ a wordd if its copy byte contains a negative 
humber two's complement). Because the dictionary is 4a 


linked list, this memory may be reused in the same list by 
reattaching it at a differant point in the list. When the 
first record is reused, the new wordd placed in it uses the 
ID number assigned to the first wordd to use the record. 
This is done to make ID asSignment 2asier and to stave off 
the possibility of running out of ID numbersS. Physical 
records other than the first may be returned to the systen 
when a wordd is deleted. 

In segments acquired by the system *0 
accommodate database expansion, only 15 physical records are 
used for the database. The first record (record 0) contains 
administrative information such as 2 record allocation map 


for the segment. 


he maxinum ID number is 255. Pies cet istics in 
Appe te Boeondi@e¢ate that even in an aggregate of four 
address books, the maximum number of unigue wordds is not 


that large. 
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Pigure 4.7 DB Dictionary Wordd Look-up. 
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b. Screen Segments 


These segments belong t9 the user for use as 
FORTH screens. A screen segment is divided into two parts. 
The first physical record contains the screen's access 
descriptor. The rest of the records contain the part of the 
segment the user sees as a screen. A screen consists of 16 
rows of 15 characters. This is much smaller than the 
standard FORTH screen which is 16 rows of 64 characters. 
The smaller screen is better suited to the 2 row by 20 
character LCD. 

When the system is first initialized (1.¢., when 
the software is first placed on the hardware), some of the 
screen segments are used to store system messaces, as in 
standard FORTH. Additionally, sone screens are used to 
store some of the definitions used in the PDBMS, particu- 
larly those used with the naive user interfaces. This 
allows the user to eliminate or change these definitions and 


system messages as he sees fit. 
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V. THE DEVICE DESCRIPTION 

Mmemene time Of this writing, the PDBMS is in the process 
of being prototyped. This first prototype is not intended 
to meet all of the desired characteristics of a PDBMS. Por 
example, it cannot be hand-held because it is pread-boarded 
and a standard keyboard is used; additionally it requires 
more than one power supply because not all of the CMOS 
components have been received. What is described in this 
chapter is the outline of the final prototype as it is envi- 
Sioned at the present time. For the most part, this is a 
description of the PDBMS as it would appear to the user. 


A. THE HARDWARE 


From the user's point of view, the hardware consists of 
four major components: 1) the enclosure, 2) the display, 3) 
the keyboard, and 4) the electronics inside. These aspects 
involve how the systen physically appears to the user, not 
how he perceives it to work. 


le he Enclosure 


= = eS eae EEE oe oS See 


The enclosure should be as small as possible and yet 
still be useful. The major constraints upon how small the 
PDBMS can be made are the size of the display and the 
key board. The minimum practical Size available with 
currently available products is approximately 9 inches (23 
cm) Pyedeanches {10 cm) by 14 inch (2.5 cn). This is the 
average size of most of the hand-held computers today, such 
as those made by Panasonic, Radio Shack, and IxO {6 and 7}. 
These systems tend to weigh around 14 ounces (400 gn). 


Their size seems to be the smallest practical one in order 
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to keep the keys far enough apart to minimize the chances of 
hitting the wrong key or hitting two keys at once®. i 2s 
doubtful that the display will be shrunk; if anything, 
future displays will be larger and allow smaller fonts, thus 
allowing more information to be shown. Ultimately, it could 
be possible for the display to dominate the front of the 
PDBMS if voice input were incorporated. This would most 
certainly reguire a large display because function keys 
would probably not be used (or even desired) and the systen 
would be expected to echo all vocal input so that the user 
could verify that he had been correctly understood. 

The back of the enclosure opens to allow batteries 
to be changed and E2PROM to be added in or taken out. aes 
last feature would not only allow the user to expand his 
memory (or treat it like a floppy disk, i.e., interchange- 
able secondary storage), but also allow the transportation 
of software and data from one PDBMS to another by a neans 
memer than through the RS232 port. The hardware and soft- 
ware of the first prototype do not include an ability to add 
more E2PROM, but the required modifications are minor. 

It should be mentioned that the current implementa- 
tion of Keys does not gracefully support the transportation 
of sealed objects from one system to another by physical 
meansportation. There 1s no way to guarantee that security 
would be uniformly enforced, independent of the system in 
which the objects are found, because key assignments are 


Moca! in context. 


6The size of the keys is peally unimportant so long as 
the user feels comfortable using hen. This normally is 
taken to mean that the keys skould not be physically uncon- 
fortable to use and they should provide some Sort of tactile 
and audible response upon being struck. 
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Pomeeloe Display 


The current display is an LCD which contains two 
rows of 20 characters each. This is larger than the 
displays in most of the currently available hand-held 


computers. These normally have one row of 16 to 20 charac- 
ters. It was felt that two lines were the mininua 
acceptable number of lines for the PDBMS. Two lines allow 


user commands and responsés to appear on one line and the 
system responses and prompts to appear on the other. This 
allows the user to compare his commands and responses with 
the system's. Ideally the PDBNS should have a larger 
display. The largest LCD displays available at this time 
have four lines with 40 characters per line, however these 
are too expensive to be compatible with cost criteria of the 
POBYS 7. 


3. The Keyboard 


Most of the keys should be 3/16 inch (0.5 cm) square 
and protrude from the keyboard background by 1/8 inch (0.3 
Gm) . The keys are separated by 1/4 inch (0.6 cm). These 
dimensions are used on most of the Hewlett-Packard calcula- 
mens £Or the arithmetic keys (1.e., + - #* X).- Using then 
as an example, the author found that keys were easily 
differentiated from one another, and two or more keys were 
almost never pushed Simultaneously. The keys should be 
arranged by functicn with the background colored differently 
for the letters, numbers, and special function keys, sinilar 
to what was done on the Quasar and Panasonic computers [ 6}. 
The on/off switch should be away from the other keys and be 
a Sliding switch, not a push switch. This should be done to 


7LCD is, the, only flat. .display atest a bea fad 
available which is power afficilent enough 
good battery powered systen. LED aad eens BGlo Gee ar 


much less power efficient. 





help prevent the accidental switching on or off of the 
power. 

The letter keys should be arranged in the standard 
NOWERTY" format, not only because of the entrenched place in 
the English speaking world [1], but also because it has been 
found to be more effective than previously thought relative 
to some keyboards designed using human engineering princi- 
ples, especially with novice users [8]. At the present only 
upper-case letters are planned cto be provided to the user 
momecexe entry. Below isa list of the keys and their 


mumet ions. 
ae Letter and Digit Keys 


These keys act in +he usual and expected 
fashion; they are used to enter the ASCII representation of 
the desired character. Tnput from these keys is handled as 
it normally would be in any FORTH system. The letter keys 
may also be used as "function keys." When shifted, using 
the shift key, the ASCII code for the key's lower-case 
equivalent is generated. These "illegal" characters are 
treated similarly to LaFORTH words; that is, they are inter- 
preted immediately upon input ;9]. fiwelally che function 
accomplished by these words is to place into the input 
message buffer and the LCD window the ASCII string represen- 
meeion of other words; they do not appear in the input 
message buffer or on the LCDS. For axample, in the database 
Management application a shift-G causes the word GET to be 
placed in the message buffer and the LCD window so when the 
return key is eventually pushed, WORD will find GET in the 
buffer, not shift-G. Notice that the keys may perforn 
different functions depending upon the current vocabulary. 


= ae 22 an am 4 «ap OOo em 22 Gea «= «a «2a «aS ae ow 


8When they must be 2e 
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be Mathematical Keays 


These keys are Similar to the shifted lettered 
keys, however they act as input immediate words without 
shifting then. That is, they always cause a search of the 
current vocabulary. This was done so that the user can 
choose to use either infix or postfix notation (infix nota- 
tion is the default definition of these keys in the "naive" 
calculator vocabulary). These keys include the following 


five keys: 
+ = x + % 
Cc. Special Function Keys 


These keys are the usual terminal editing keys, 
and with the exception of the "NEXT" keys, they are not 
programmable. The keys are described below. 

(ij Enter. This key tsauses a carriage return 
and line-feed to be placed into the input which is reflected 
upon the LCD. This causes the interpreter to begin parsing 
ene input. 

(Zi Del. This causes a control-H to be input 
and acts as a character deletion key. It backs up the 
cursor cne position and displays a space on the LCD. 

(3). >. This moves the cursor to the right one 
character position without effecting the contents of “he LCD 
Window cr the message buffer. 

(4). <. This moves the cursor to the left one 
character position without effecting the contents of the LCD 
window cr the message buffer. 

(ike Shift. This is a non-locking shift key 
used with other keys to elicit their alternate definiticns. 

(6). X>. favs —deretes alle inpuc fron, and 
@eaecluding, the current cursor position to the end of the 
line. 
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(je Nek teeagea NEXT? . These keys are used to 
scroll the display to the next line above or below, respec- 
tively. In the database application, the shifted NEXT keys 
are used to scroll to the next field above and below the 
euerent field. This allows fields to include carriage 
returns and line-feeds so that a field need not be 


constrained to one logical line on the display. 


Be THE SOFTWARE 


When the user initially receives the system, he is 
presented only with two functions: a calculator and a data- 
base manager. He does not have direct access to ROOT. This 
was done to help prevent the user from inadvertently 
destroying the system before he understands it. For 
example, it prevents him from redefining or forgetting a 
word accidentally. The user can expand the scope of the 
system gradually as he learns more about it until he can, if 
he chooses, run it strictly in FORIH (or even redesign the 
system to a great extent). Wieser eer epllwty ~1S gained by 
using FORTH execution vectors. In the case of interfacing 
with different levels of users, ther2 is a different version 
of FIND for eéach level of user sophistication. So as the 
user becomes mor? adept with the system, the vector associ- 
ated with FIND is simply made to point to 4 new, more 
powerful version of FIND's run-tine code. The version 
initially available to the user only searches the linited 
calculator and database management vocabularies; the ROOT 
vocabulary is not searched. The version available to the 
most sophisticated user includes a nodified version of the 
standard FORTH FIND. All FINDS have been modified to be a 

ittle more user friendly. Instead of reporting the usual, 
Bees UNDEFINED," when a word is not found, the PDBMS reports 


the current vocabulary's name é@s well. So for exanple if 
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the user entered a {:} when he was using the database vocab- 
ulary where it is undefined, the system would report, "NOT 
DATABASE WORD." Notice that this message may fall off the 
right-hand side of the display for some words; but the first 
word of the message should cue the user to the error and if 
he then realizes that he has forgotten what the current 
vocabulary is he can move the display to the right using the 
Cursor control keys. 

There is no editor in the "initial" system because all 
of the needed functions are available «hrough the keyboard 
keys, making the PDBMS a full-screen editor, albeit a small 
screen editor. There is an editor vocabulary which is 
defined in the PDBMS after ROOT and ASSEMBLER. This editor 
is only needed once the user has begun working directly with 
screens. Table 5.1 shows the vocabulary structure of the 
PDBMS. The concept of sealed vocabularies? is employed; 
however notice that some words link one vocabulary tempo- 
Gerily to others. Por example, SBAL causes a search of the 
Key vocabulary. SEAL and UNSEAL are defined in the DB 
vocabulary to be themselves (1.2., taney Simply point to 
their definitions in ROOT). This allows them to be used by 
the naive user withceut directly accessing the root vocabu- 
Mary. E2 PROM permanent vocabularies (i.e., Key, file, and 
Virtual block) are not linked through each other or those 
vocabularies defined in RAM. Thus PORGETting a definition 
in RAM which precedes a file, block, or key definition will 


not é€rase any E2PROM definitions!9, 


9These are vocabularies which confine word searches to 
themselves, and usually FORTH. The FIND used in fig-FORTH 
searches all parent vocabularies 9£f the current vocabu- 
aries. The calculator and database vocabularies are 
pal. sealed in that not even the root vocabulary is 
searched. 


210A sometimes problematic. feature of standard FORTY is 
that all definitions are actually maintained in one straight 
linked list; vocabularies only describe search paths poecuds 
the one list. The traditional FORGET simply deletes all 
definitions created after the aqei=nr tion to be 
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Figure 5.1 


PDBMS VYocabulary Structure. 
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ieesie Calculator 


Initially the calculator is entered by pushing 
om fto-C. tis ep laces tha user © ints the calculator context 
whose vocabulary contains redefinitions of +, ~, x, and # so 
that they are infix operators. PIND has been modified so 
that if a word is not found and an equal sign has been 
previously interpreted, a constant is created. This allows 
the user to store temporary results by creating "variables" 
Simply by using an undefined word. For exanple, 


1+ Be=aA 


would cause "At to be created. If "B" had not been previ- 
ously defined an error condition would be raised when it was 
moe found in the dictionary. The equal sign is an input 
immediate which causes "A" to be reated, if need be, and 
sets up an execution vector *o cause the ENTER key to store 
mao cOop of the stack into "a." 

Because a derivative of FORTH is used, floating 
point arithmetic is not used. The system defaults provide 
the user with a fixed two digits behind the radix point. 
Like FORTH, the user may choose any base (radix) for arith- 
metic operations, within the limits of the number of input 


Symbols available. 


2 = he Database 


Initially the database management system is entered 
by pushing shift-D. This vocabulary allows users to create 
files, create records, retrieve records, update records, 
delete records, and delete files. Additionally the user nay 


forgotten——even if they are not in the current vocabulary. 
When there. are multiple vocabularies, this can create 
dangling pointers in vocabulary definitions. 
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create and delete Keys, and use Keys to lock records and 


other Keys. 
a. Keyboard Key Definitions 


When the user is placed into the database 
context the NEXT keys are redefined as described before. 
Besides those two keys, tha following shifted characters are 
defined. These keys are described below. The word which 
appears on the display and in the input message buffer when 
the key is pushed is shown in parentheses. 

(1). D (DELETE). This is used to delete a 
file, record, or Key. There are three different DELETEs, 
one in each the DB, file, and Key vocabularies. Each delete 
effects only those elements in its respective vocabulary. 
The delete in the file vocabulary deletes files, the one in 
the Key vocabulary deletes Keys, and the one in the DB 
vocabulary deletes the current record. 

(2). F (FILE). This word changes the context 
meee the interpretation of the words following it in the 
input stream so that the file vocabulary is searched. The 
context is reverted to the DB ("calling") vocabulary when 
the first word not found in the file vocabulary is encoun- 
mered. The last filenam2 nentioned before the context is 


Switched out of the file vocabulary becomes the "current 


eoLe. * 

(lee G (GEL). This is used +t0 initiate a 
mecord retrieval. Table [II shows a typical record proce- 
dure. First the user is asked if the current file is the 
one to be searched, or asked for a file if there is no 
current file. Then the user ls presented with the names of 


the fields of the records in the file so the user can enter 
values which are to be used as key attributes for retrieval. 
If the user does not desire td enter a value for a partic- 
ular field, he simply presses the ENTER key. The query in 
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Maple Tir is a request fot any record in the ADDR-BK file 
Woich contains "TABETHA" in its NAME field and "MONTEREY" or 
fame ein its CITY/ST field. Beror2 actually performing a 
retrieval operation, the user is asked if he still desires 
to do the retrieval allowing him to abort a query if he has 
realized that he has made a mistake. 


TABLE III 
Record Retrieval 


E ADDR-BK ? 


TEREY VA. 


1 RECORD FOUND 
PUSH NEXT 


—— i, EE a ee Se ee et Ge I eee Oe er 


rt”? woes 


eee oe eee a a ee) ee 


(4) eH oe (HED). This is used to make a Key 
which has been made known through a UNSEAL operation, 
unknown. 
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()re ho (KEL) This word changes the context 
BeGe the interpretation of the words following it in the 
input stream so that the Key vocabulary 1S searched. As 
with the shift-F, the context reverts to the calling vocabu- 
lary when the first word not in the Key vocabulary is 
encountered. This word does not effect any Keys or the Key 
vocabulary, it is only used as a prefix word for MAKE and 
DELETE. 

(6). .% (MAKE). This word, like DELETE exists 
in the DB, file, and Key vocabularies. Each different 
version creates a record, file, and Key respectively. 

(7). N (NO). This is used aS an answer to 
appropriate system promots. 

(8). P (PUT). This is analogous +o SAVE- 
BOrFERS and FLUSH ir that it writes the current record to 
secondary storage. 

Crean 8 (RECORD) < This word is included for 
consistency reasons. It 1s used to preface DELETE and MAKE 
when the user wishes to use the DB definitions of these 
words. The DB DELETE and MAKE must be prefaced by RECORD so 
that there is less chance of an accidental record deletion. 

(10% S. (SEAL) ~ This is used +o seal a Key or 


meme Current record. It is simply defined as: 
Peo uak, ROOT .S2Ab 5 


Mizs allows the user access to the root word SEAL without 
directly accessing the root vocabulary. 

(ti). UG (UNSBAL) . This word is used to unseal 
all objects sealed with one or more Xeys. It, like SEAL, is 
Simply defined in terms of the root word UNSEAL. 

(eee ES) This is used as an answer to 
appropriate system prompts. 
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be. File Creation 


Files are created simply by using the words FILE 
and MAKE. Upon entering Shatt=-F (Or FILE) and shift-M (or 
MAKE), the user needs only to follow the system's prompts. 
Table IV shows the file creation sequence. The user's input 
is underscored. The user always gets an additional field 
called "miscellaneous" added to the bottom of all records. 
This is included because it was found that people's personal 
data does not normally fit a uniformly structured record. 


c. Pile Deletion 


File deletion is simply 2ffected by the sequence 
shown in Table V. File deletion is not a trivial matter 
Since the E2PROM is organized as a heap with physical 
records containing a mixture of sealed Keys, DB dictionary 
entries, and records from various files. Mince oreati, 2 
user cannot delete a file unless he has unsealed all of the 
records in it, so DELETE must make one pass of all the 
records in the file to ensure that they are all unsealed. 
If all of the records are unsealed, then a second pass is 
made of the records reallocating all of the physical records 
back to the system (1.e., setting their corresponding bit to 
zero in the record bit map). Additionally, on this pass the 
Meese Oyte of Gach physical record is set to 80H (the 
system's Key) while the second byte is set to FFH (the null 
Key). Then the DB dictionary must be searched for all 
references to the deleted field numbers, and these must be 
removed. When a field reference is removed from a wordd's 
ist Of field IDs, the hole created by this deletion is 
filled by moving the last entry on the list up to <the 


vacated spot. Physical records vacated by this operation 
are returned to the systen. Finally the file's vocabulary 
and its field entries can be forgotten. Obviously file 


deletion 1s a lengthy and complicated process. 
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TABLE 1V¥ 
File and Key Creation 


alll 


NAME? 

FED 1 NAME? 
NAME 

FLD 2 NAME? 
STREET 

FLD 3 NAME? 
CITY/ST 

PLD & NAME? 
PHONE 

FLD 5 NAME? 
Sentez> 

FED’ > MESC OK 


Key Creation 


Rey wake SECRET 
OK 


cr cmt ar Sey i ie a ee EE EE A ee ee ee Se 


d. Key Creation 


Creation of a Key is very simple, as shown in 
Table IV. The example shows the creation of a key named 
Pomenet."® All that is required to create a Key is the addi- 
fem Of "SECRET" into the Key dictionary es a constant and 


initializing it to the next available Key ID number. 
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TABLE V 
File, Key, and Record Deletion 








File Deletion 


Key Deletion 


KEY SECRET DELETE 
DELETE SECRET? 
fEs 


DELETED OK 


Record Deletion 


RECORD DELETE 
DELETE RECORD? 
YES 

DELETED OK 


— 
ee ee — i a a 


eM gee es ee i Se ee > ae 
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e. Key Deletion 


Key deletion is accomplished in the same manner 
by which files are deleted, as shown in Table V. Also like 
mere deletion, the mechanics of Key deletion are not the 
equivalent to a straightforward FORGET. Before a Key can be 
deleted from the dictionary, all occurrences of the key in 
the various access descriptors must be located and changed 
to reflect the Key's deletion. This entails searching the 
access descriptor of each screen, record and sealed Key and 
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converting the deleted Key's ID to FEH (the deleted Key ID). 
After this is done the Key is deleted from the dictionary. 
A sealed Key's physical record is returned to the systen, 
after setting the first byte to 80H (the system Key) and the 
second byte to FPFH (the null Key). 


f. Record Creation 


To the user record creation dialogue is similar 
to the one associated with file creation. What is involved 
memcollecting the desired data, encoding it!!, finding phys- 
Meal records to hold the logical record, and finally linking 
tha record into the parent file's linked list of records. 
Currently the linked lists of records are maintained in 
chronological order (i.e., aS a circular queue). This may 
be frustrating in some applications where the user would 
like to peruse the database in some specified order. For 
example, it is not possible to view the records of an 
address book alphabetically by surname, unless they were 
Serginally entered in that order. Because of the unfor- 
matted nature of the fields, it is very difficult to sort a 
file by key attributes. 

Pomwould nOt be too dzfficult to allow the user 
to specify a record ordering other than chronological. This 
could be done by allowing the user to flag a wordd in the 
record as the sort-key-value (for example the last word ina 
Meeeord Starting with the character “d"). Then when the 
record was PUT into the database, it would be inserted into 
the file's linked list alphabetically relative to the other 


"@-wordds" in the file's other records. So the user could 


monis ineluces convert 
wordds, and then the add 
dictionary. 


Pads tOeounct lati 
e wordds into 


c+ O 
oes 
tH 





Maintain the file sorted by surname by prefacing ali 


surnames with a "gri2, 


TABLE VI 
Record Creation 


ln Mm 1G & 
2 ale ote fs 


td 


, as aes ee 
2) 


es ae OEE ee 


Table VI shows a typical BeGOrd secreec On 
sequence. Notice that no phone number was given; a null 
eiiesy iS Signalled by hitting the ENTER key. Also notice 
ae tnere iS anh implicit "current file." This file is the 
last one referred to after the last use of FILE; had no file 
been explicitly referenced before a record creation was 
attempted, the PDBMS would have requested a file name. 16a 
the file was not found, the user would have been asked if he 
desired to create a file or abort the record creation. 


1<This may not appeal to many users, but it would 
hecesserily ave tO appear in the name field, 
"@-surtname" could be placed in the "niscellaneous" field. 
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g. Record Deletion 


Record deletion is requested by the user in the 
same fashion as file and key deletion. Record deletion 
involves first removing the record from the file's linked 
list by making the two records adjacent to the current 
record point to each other. These links are found in the 
current record's previous and next link bytes (see Figure 
4.5). Then all of the wordd references to the record in the 
DB dictionary must be deleted. Finally the physical records 
are returned to the system after setting the first byte to 
80H and the second to FFH. 


h. Update 
Only records may be updated; files and keys 
cannot. Records are simply updated by GETting them, modi- 


fying them using the cursor control keys, and then PUTting 
them back. Like FORTH, once a change has beén made toa 
record, it is marked as b2ing updated, whether or not the 
change is later undone in the same editing session. Once a 
record has been marked as updated and it is PUT, the updated 
record is added as anew record, and the old record is 
deleted. This is not quite as drastic as it may sound. The 
old record is used as a template for encoding the new 
record. Wordds which are unchanged can be copied from the 
Smaecrecord directly into the new record. The old record 
also contains all of the pointers into the DB dictionary 
where new virtual addresses must be substituted, so the 
dictionary must be searched only when a new wordd is added. 
Record update is actually a record creation and deletion 
operation. 

It could be possible to allow file editing 
(i1.e., the addition and deletion of fieids) by performing 


the same type of operations as are amployed in record update 


Us 





(een, GCledting anew file, transferring the appropriate 
mata trom the Old file ints the new file, and then deleting 
the old file). However, this was considered too complicated 
and slow to justify its inclusion for what would probably be 
acare event. Besides, by always including a "miscellaneous 
field" in all records, it was felt that this would probably 


not be a very necessary operation. 
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VI- SYSTEM SECURITY DESIGN 

As stated earlier, security is important ina PDBMS 
because of the personal nature of the information 1+ may 
contain. However, the tyve of security afforded in this 
design is probably better suited for a larger system. 
Probably all that is required for such a system as the PDBMS 
is a simple mechanism which employs one Key or password. 
This allows the user to hide anything he desires at one 
level of security (1.46., one either has access to all of the 
data or has access to only a subset of the data). The PDBMS 
uses a much more elaborat2? systen. This was done to test 
two things: the feasibility of securing FORTH, and the 
feasibility of implementing a security mechanism Similar to 
the one described in reference [10]. FORTH was chosen as 
the language to implement the PDBMS with no firsthand knovwl- 
edge of the language. Because it iS an interpreted 
language, it was felt that there would be no problems with 
securing the systen. However, after receivin the PORTH 
documentation and software many doubts were raised about 
whether the language could be secured. 

At first one thing which seemed essential to securing 
the PDBMS was the restriction of the user's ability to use 
assembly language. If the user can write words in assembly 
language using physical addresses and ports (the only way to 
write such words on the NSC9300 since it does net support 
segmentation and privileged nodes) there is alaost no limit 
to what he can do. All standard FORTHs are very close to 


the hardware and allow vords to be written in assembly 


language, besides PORTH. As a matter of fact, it is so 
close to the machine, that in 8080 fig-FORTH and FORTH-79, 
it is impossible to prevent the programmer from writing 
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assembly language defined words without changing FORTH *o 
such an extent that it 1s no longer the same language. biel 
these two systems, the words which are used to specify code 
@eeomitions (:CODE, CODE, END=-CODE, and {;S}) are all high- 
level words (i.@., words written in FORTH as contrasted to 
low-level words which are written in assembly language), as 
is the assembler. As far as the author can determine, there 
is no low-level word which can be "hidden" from the user 
without having a detrimental effect and which is required 
for entering assembly language defined words. 

The word "hidden" is enclosed in quotes in the previous 
paragraph because no word can be hidden from a user in his 
address space. "Hidden" means that the user neither knows 
of the hidden word's existence or doesn't know where to find 
mesederinition, nor can he erecute it directly. Y Word in 
FORTH which can be located can be exacuted even if it is not 
maeethe FORTH linked list word dictionary (one simply puts 
the address of the first executable byte onto the parameter 
stack and evokes EXECUTE). Tf auser is to be allowed to 
program in FORTH, he must be allowed to access words in the 
ROOT dictionary, and in order to access words, their names 
must appear in the dictionary since FORTH searches the 
dictionary by name. This makes 1+ very easy for a user to 
traverse the dictionary and look at its contents and at the 
Mecation of words. Tt would not be hard, though probaoly 
Bea2eus, to find a word not included in the dictionary by 
checking for unaccounted gaps between words in the linked 
Mest OL finding a reference *o0 a csode field address of a 
word which does not appear in the dictionary. If one were 
to seriously consider hiding words, the best way to do this 
would be to remove all of the headers (the name and link 
Bwelas) from all of the dictionary 2ntries. Such a systen 
could not be extended because no words in the dictionary 
could be found (since the name and link fields are necessary 
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to search for a word). If the PDBMS was to be secured there 
had to be another method which eith=r prevented the use of 
assembly language or worked regardless of the fact that the 
user could use assembly language. 

In the PDBMS, FORTH could possibly have been secured 
entirely by using software and still allowed the user to 
program in FORTH, however it would have undoubtedly been a 
very limited subset of the language. Such a system would 
have not needed EPROM; instead a cold boot could have loaded 
the system in from E?PROM. Verifying such a system would 
have surely been a problem. Inst2ad the PDBMS relies on 


both hardware and software to enforcé system security. 


A. HARDWARE SECURITY MEASURES 


In multi-user systems hardware support of security is 
essential; in truly secure systems it must be verified that 
there are parts of the system that no one but system admin- 
istrators can access. In the PDBMS the hardware and 
software enforce security to such an extent that even the 
owner of the system cannot access parts of the system at 
alli3. This is desirable because it not only prevents other 
persons who are not the owner of the PDBMS from compromising 
or destroying the system, but it also prevents the user fron 
"terminally crashing" the system. Many of the system's boot 
parameters are stored in EPROM and £2PROM. If these were 
Mest, che system could not be booted up. 

It is the interaction of the EPROM and the "smart ports" 
Which is the hardware portion of the system's security. 
Simply, the ports which control actress to virtual memory, 


the keyboard, and the RS232 port only accept instructions 


13The PDBMS has not been roven correct and secure i 
the sense of the ways described in references [11 and 12] 
However, the autnor believes that it can be madé secure an 
Tigorously proven to be so. 


n 
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executing from EPROM, as discussed in Chapter IV. Because 
EPROM is read-only, the user is forced to use procedures in 
it to access these external devices. Thus if the procedures 
in EPROM can be verified that they are not only correct, but 
they are also unsubvertable, then the PDBMS can probably be 


made securetl*, 


Be SOFTWARE SECURITY MEASURES 


The hardware in itself does not guarantee a secure 
system; there must be some verified software which operates 
ae. There are three different aspects of the software in 
the PDBUS which are used to provide security. A fourth 
aspect 1s mentioned here which is related to security but is 


not involved in system security per se. The first three 
items are: straight-through code, maintenance of systen 
parameters and tables in E2 PROM, and Keys. The fourth item 


is the FORTH concept of execution vectors. 


Contrary to FORTH programming styls, words which are 
involved with port access nust be low-level and indivisible. 
This means that these words must not be defined in terms of 
Other words, i.e., they cannot be colon definitions, «hey 
must be code definitions. For example, it seems obvious 
that one would like to write the following low-level words 
for use in other system management words because they would 


be very commonly used: 


_'*A correct procedure jis one that does only what it is 
designed om oj nothing more and nothing less. 
Gnsubvertability is a strongef condition than correctness in 
that it means that even combinations of modules of correct 
code and portions of modules cannot be caused to be made to 
materact incorrectly. ip seas oc CONCeErN in che PDBMS since 
Eee user can read and erecute the system's source machine 
code. 
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E2PROM_ON ( Turns E*PROM power on ) 

E2PROM WRT_ON { Turns E2PROM write power on ) 
WRT _E2PROM ( Initiates an E2PROM write ) 
E2PROM_WRT_OFF ({ Turns E2PROM write power off ) 
E2PROM_OFF { Turns E*PROM power off } 


However, as mentioned before, if a word exists in the user's 


address space, he can find 1t and 2xecute it. This means 
the user could find E@PROM_ON and E®PROM_WRT_ON, and execute 
them from EPROM. Then using his own assembly language 


routines, he could manipulate the contents of the E?@PROM. 
The only way to prevent this is to create a minimum set of 
virtual memory management words which, once execution of any 
one of them begins, never branches out of the word or 
Beeurnms to the inner interpreter without first turning off 
access to the ports. Also these words should be written so 
that if the user jumps into the center of their code, they 
are still correct. 

The first requirement is fairly easy to achieve 
because these words are resident in EPROM, thus because they 
cannot be altered, if a user jumps to, or into, them it can 
be assured that he cannot effect the execution of the words. 
The second requirement is much more problematic. Satisfying 
this means that the actions of thease code sequences can 
Maintain system security regardless of the actions performed 
before and after thelr execution, and regardless of whether 
the entire sequence is executed (i1.32., the user jumps into 
the middle of a code sequence). For example, the user must 
not be able to use the code of one word (whether it is the 
entire cede sequence or a part of it) tO Set up the segment 
register to point to the Key dictionary, and then by using 
another word, retrieve the Key -.dictionary. 
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By controlling access to E®PROM it is possible to 
use parts of it to store information which the user should 
not have access to. Chapter IV discusses the information 
which is stored out in E2PROM which is not accessible to the 
user. The locations of the parameters and beginnings of 
these tables are static so that they may be referred to 
directly by using their segment number and E*PROM addresses 
feOO08 through FRFFH). These references are found in EPROM 
where they are visible to the user. The insurance that the 
user cannot directly access these segments must be incorpo- 
rated into the design of the straight-through code. The 
code must be written so that when control is passed from the 
word to the inner interpreter, the user is left with no morse 
information about the tables and parameters than he is 
authorized access to. Any routines which do system table 
and parameter maintenance are designed so that they work 
directly on the E@PROM and never bring the contents of these 
segments into RAM. This makes it easier +o ensure the 
security of system segments. 

The above is not entirely tru2 of the PDBMS. During 
retrieval operations, virtual addresses are brought into the 
data buffers. Thus the user can gain some information about 
the maintenance of the system's segments by dumping the 
contents of these buffers. This information is kept in RAM 
because it is a "write-intensive"™ operation. Additionally 
it must be left in the buffers after the system is finished 
With processing the query because the virtual addresses must 
be used to find the records which satisfy the query condi- 
eons. The current record's virtual address is needed so 
that if it is updated the location of the old record can be 
found and deleted. Thus the user can gain access to the 
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virtual addresses cf records to which he is authorized. 
Allowing the user access to the virtual addresses of all of 
the records which satisfy 2 query gives him some information 
from which he can make inferences about the allocation of 
physical records, including those to which he is not author- 
ized access. How much information can be gained through 
inference seems to be limited by the fact that the segments 
momen 2ch these records occur contain not only records (which 
can use varying numbers of physical records), but sealed 
Keys and DB dictionary e2ntries (Which also use varying 
numbers of physical records). Additionally if any deletions 
or updates ever occurred, the physical records may no longer 
be allocated in a sequential and chronological manner. Thus 
in a mature (1,e., one which has processed a number of Key 
and record additions and deletions) system, it is question- 
able that much meaningful inference can be done. Of course, 
the problem can be avoided entirely by keeping all of these 
virtual addresses in E*PROM at the expense of system speed 
and possible E2PROM "burn-out." 


3. Keys 


The proper implementation of Keys relies heavily 
upon the preceding hardware and software base. Keys are 
very simple-—nothing is fetched from =E2PROM unless ithe 
proper Key(s) has been UNSEALed (or nade known). The opera- 
tions associated with SEAL and OUNSEAL effect the Key 
dictionary but have no effect upon sealed objects. As 
mentioned earlier, Keys are maintained in a dictionary as 
const ants. When a Key is UNSEALed, eme high bit of its 
Character count byte is set to one. When a data object 
fetch is requested, the object's access descriptor field is 
"computed" to see if the requisite Keys have been previously 
Made known. 
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The access descriptor fields are limited to the 
first physical record for screens (16 Keys), 15 Keys fora 
sealed Key (one physical record less one byte for the sealed 
Key's ID), and no limit for database record (Since they are 
permitted to cross physical record boundaries). However for 
consistency, from the user's point 9f view, 15 Keys is the 
limit for all system objects. The Keys may be “anded" and 
"ored" with each other to form complicated access mecha- 
nisms. This may be further extended by adding layers of 
sealed Keys. For exampl2 if access to the current record 
required the Keys "CONFIDENTIAL" and "ACCESS," or the Keys 
"SECRET" and "ACCESS," the current record could be sealed as 


follows: 


oy GOle Lomi TAL ACGESS & SSCRET & {| RECORD SEAL 
Or 
PP eCONDEDGNETEAL SECREE | “ACCESS G© RECORD SEAL 


weenre "5" 15 a logical “and"* and "j" is a logical “or." if 
CONFIDENTIAL's ID was one, SECRET's two, and ACCESS's three, 
and the second example above had been used to seal the 
record, then the record would have four key bytes which 
would contain: 


O18 02H 83H Gis 


Notice that the high bit cf ACCESS's ID was set to one. 
Maps Signifies that it is to be “anded." <A zero high bit 
Signifies the Key is to be "ored."* Unique "access paths‘ 
are described in both the SEAL process and the access 
descriptor because they are specified using reverse Polish 
metation. 
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When an attempted fetch of a record is made, the 
fetch algorithm starts by setting a fetch flag to true (the 
value one). Then it simply reads each Key ID from the 
access descriptor and searches the Key dictionary to see if 
the Key is known (i.e., the high bit-of its character count 
is set to one). If the Key is known, the search returns a 
one, otherwise a zero. The result sf the search is “anded" 
emeeored’ with the fetch flag accoriing to the high bit of 
the byte in the access descriptor. When the null Key is 
found in the access descriptor, the value of the fetch flag 
determines whether the object is sealed or unsealed. 

Since the Key dictionary entries are maintained as a 
FORTH dictionary and FORTH dictionaries are searched by 
name, it may seem that searching the dictionary using the 
Key's ID may be difficult. Ie ispeeen. fact, faster than 
searching by name. This is because of the structure of thse 
dictionary entries which allow the Key's value to be 
retrieved easily because it is located in the byte inmedi- 
meely following the CFA. Searching by name 1s slower 
because it involves string comparisons. 

Rewciemroot Of the mey d2ctionary (i.e., that entry 
whose link is equal to 00008) is the definition of MAKE. 
Below MAKE are all of the other colon definitions in the Key 
vocabulary. After the last colon definition is the defini- 
tion of the system Key. This 1S a constant like the other 
Keys but its value is 80H and its count byte contains a OOH. 
This means that its name's length is zero, and thus it has 
no name andcannot be found by a name search of the 
dictionary. Because it cannot be found, it can never be 
UNSEALed or made known, so the high bit of its character 
count will always remain zero. Below the system Key are the 
definitions of the null Key and the deleted Key. These 
Keys* values are FFH and FEH respectiveiy and their char- 


acter count bytes are egual to 80H. This means that they 
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also have no name and they always remain UNSEALed or known. 
Because these three Keys' values are greater than 127, tney 
are always “"anded" into any Key ID list in which they 
appear. 

Changing a deleted Key's ID number wherever it 
occurs in an access descriptor list results in a "sensible" 
Sendai tion. That is, all other Keys are still required in 
their same logical relationship except that Key (or rela- 
tion) which preceded the deleted Key which now takes the 
place of the relation between itself and the deleted Key. A 
major problem with deleting a Key is that the user may not 
realize the data objects which he is effecting or how he is 
effecting thea. This is an unresolved problem ir the PDBMS 
and 1t is more complicated than it appears on the surface. 

Finally, there is one last important operation which 
concerns maintenance of the Key dictionary: making Keys 
unknown. The user can make Keys unknown oon an individual 
basis by using HIDE. For example, 


Cote oa were real Ds 


makes "SECRET" unknown and seals any objects which are 
sealed with SECRET. Whenever an non-maskable interrupt is 
generated, the virtual memory manager makes all Keys whose 
character count is greater than 80H unknown. 


G4. Execution Yectors 


Execution vectors are used in the PDBMS to allow the 
user to interact with only that part of the system which he 
understands. However, they can be used to provide svsten 
security to an extent. Simply, if a user does not know how 
to change a vector's value (or a collection of vectors) ofr 
Mieece Value to change it to, the situation is similar to 


needing a password to access a more powerful system. At the 
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lowest level it is easy to prevent a user from uSing more of 
the system than is desired. If the user is constrained to a 
vocabulary which does not contain words which would allow 
him the make colon definitions (e.g., {:}) or access memory 
Geamectly (e.g., {t}, {83}, etc.) the inner working of the 
system can be hidden from hin. Making a user more privi- 
leged simply means giving him the name of a word which 
changes the values of the execution vectors (of course this 
word cannot appear ina listing of the vocabulary). As the 
system to which the user gains access becomes nore powerful, 
it becomes progressively harder to provide system security 
by using execution vectors without relying upon hardware. 
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A good description of the concepts upon which FORTH is 
based may be found in reference ({13]. PORTH is a stack- 
oriented, threaded, interpretive language. Lees neted for 
its compact size and fast execution (compared to other 
interpreted languages such as BASIC). The 8080 f£ig-FORTH 
model (version 1. 3) occupies less than 9K bytes of memory 
(which includes the first page of memory occupied by CP/M). 


Residing in that 9K is the FORTH Lnterpreter, Cceogpiler, 
dictionary, and a line editor. There are two "generic" 
FOR THS. The older versioa is usually referred to as 


"fig-FORTH," the newer version is usually referred to as 
"FORTH-79." FORTH-79 was designed to be a standard which 
establishes the minimum requirements of the languadqde. 
Specifically reference [2] states that the purpose of 
FORTH-79 is 


mre tO a2llOw transportability of standard FORTH programs 
in source form among standard FrORTH systems. aA standard 
program Shall execute equivalently on all standard FORTH 
Systems. 


The bibliography contains alist of sources used by the 
author while learning FORTH. Anyone who seriously desires 
to understand the language should have at least some of 
these books and pamphlets. 


Ae WORDS 


The basic unit of tha language is a2 "word." Words can 
be "colon definitions" (analogous ‘9 functions and proce- 


dures in other languages), variables, and constants. New 
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words are defined interms of previously defined words, 
making the language extensible. Defined words are kept in a 
linked list called the "dictionary." The dictionary is 
Maintained as a stack (First-In-First-Out or FIFO) so that 


the newest words are searched first. Thus previously 
defined words can be redefined. Dictionary entries are 
pruned by using the word FORGET. When av word is 


“forgotten,” all words defined after it are also forgotten. 
Rather than a straight linked list, the dictionary can be 
extended in a tree structure where branches denote different 
contexts. Table VII is alist of the FORTH-79 required 
words. The words in lower-case are dictionary entries for 


the run-time code for the corresponding compiling word. 


Be SYSTES DATA STRUCTURES 


Figure A.1 depicts the standard PORTH memory organiza- 
ero Th. The user dictionary grows up towards high memory 
while the parameter stack grows down towards the dictionary. 
The unused portion of nmenory separating the two is called 
mame pad. The beginning of the pad moves up in memory with 
mye d2ctionary pointer (DP). It is usually located 44H 
Byees in front of the DP. Likewise, the input message 
Dumrer GrOowS up in memory. according to the size of the input 
message while the return stack grows down towards the 
message buffer. 

The parameter stack is used for nathematical data manip- 
ulations and parameter passing. The data on the stack is 
operated upon using reverse Polish (or postfix) notation, 
Similar to Hewlett-Packard calculators. The return stack is 
used by FORTH for storing the interpreter pointer (the 
address of the next higher context, i.e., the calling word). 
The pad is used primarily for string manipulations. Systen 


variables are those variables maintained and used by FORTH 
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Figure A.1 Standard FORTH Memory Map. 
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and not directly accessible to the programmer. User 
variables are declared, maintained and used by the systen, 
but are directly accessible +o the programmer. Examples of 
system variables are cold boot parameters and CP/M disk 
interface parameters while examples of user variables are 
the dictionary pointer the current radix (called BASE), and 
the current execution state (called STATE). 

The number of block buffers is dependent upon the amount 
of physical memory available. Standard PORTH blocks are iK 
bytes in size and are stored in secondary storage, thus 
giving FORTH what its users call virtual memory. FORTH 
automatically allocates buffers as they are needed according 
to which buffers have not been allocated yet, the age of the 
contents of occupied buffers, and whether any buffers 
contain updated data. Blocks containing FORTH "programs" 
are commonly referred to as "screens" because they are 


Beematted for CRT display; i.e., 16 lines of 64 characters. 


C. THE HECHANICS OF FORTH 


There are less than 70 assembly routines in FORTH-79, 
most of which are less than 20 instructions long. When 
FORTH words are interpreted, it is these routines which 
ultimately are executed, except in the case of user code 


defined words. Alp @words in FORTH contain a code field 
address (CFA) which is a2 pointer to an assembly language 
routine which defines the word's run-time behavior. A 


constant's CFA points to constant which is an assembly 
language routine which places the contents of the two bytes 
following the CFA on to the parameter stack. A code defined 
word's CFA simply points +9 the byte following the CFA—the 
beginning of the word's code definition. 
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ftem@nA Creas COLON Gdetinition points to colon. See 
Figure A.2 for the structure of a colon definition in the 
Epo MS « This routine has different actions, depending upon 
the specific version of PORTH (i.e., whether the systen 
increments the interpreter pointer before executing a word, 
or after). In general though, colon pushes the current 
value of the interpreter pointer (which points to the 
current word being executed in the post-incrementing 
Systems) onto the return stack and then sets the interpreter 
pointer equal to the contents of the first two bytes 
following the current word's CFA. These two bytes contain a 
Peemcter to the CFA of the first word inthe currently 
executing word's parameter field address (PFA). Thus the 
execution of a word describes an inorder traversal of a tree 
of FORTH words used to define a word and all words used in 
m@nose definitions, etc. Leaves on this tree are code 
defined words, constants, variables, user variables, and 
other data types; nodes are colon dafinitions. 

Complementing colon is semicolon. This is the runtime 
code of {3} which is the last word in every colon defini- 
ca Th « What semicolon do2s is simply pop the return stack 
and sets the interpreter pointer e?qual to the popped value. 
This causes execution to nove one layer higher in the tree 
described above. The topmost word in the tree is QUIT, 
Mmach iS an infinite loop. So when the interpreter 
completes the execution or compilation of a word, execution 
Beeurns tO OUIT which loops waiting for more input. 

The heart of FORTH is the inner interpreter. In the 
8080, 280, and NSC800 all this short code routine does is 
take the interpreter pointer and push it into the progran 
counter. This technique of passing control from word to 
word makes PORTH almost incomprehensible until the entire 
system is entirely understocd. Because FORTH uses almost no 
Subroutine calls and jumps, flow Ot. FCon.no ) (2S° not 
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immediately apparent. Emeoge0 E2g=FORTH (version 7.3) 
almost the entire FORTH system past the first 1K bytes 
consists of "DB" and "DW" instructions!$. Like LISP, most 
of FORTH consists of data structures which can be used as 


data or executable code. 


_ '!5The "DB" (Define Byte) and "DW" (Define Word) instryc- 
tions are 8080 assembly language psuedo-instructions whic 
are used to insert data into code areas. For 2xample the 
FORTH message "OK" (followed by a carrage return and line 
feed) is inserted into the source code of FORTH by uSing the 
Sue Instruction as follows. 


DB Vek, Von, 0 An 
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APPENDIX B 
SEUGY STATISTICS 


A. BACKGROUND 


In order to understand what might be involved in a 
Personal Database Management System, four address books were 
studied in detail. The results of this study served as a 
Wes2s £Or much of the design of the PDBMS. It should be 
pointed out that the results of this study are probably not 
indicative of the American population as a whole. The books 
were not selected on any scientific basis and had the 
following important characteristics which probably skewed 


the findings: 


e All of the books belonged to friends and neighbors of 
mae aithor in California. Thus man addresses, zip 
codes, area codes, etc., had common values. 


e All of the books were kept for families and not individ- 


uals. The effect of this in uncertain but because of 
this entries in these books fell into four distinct 
categories: 


o The husband's relatives (characterized by similar 
names, cities, states, zip codes, etc.). 


a The wife's relatives (having the same characteristics 
as mentioned above}. 


meecocal friends (characterized by similar cities, state, 
zip codes, telephone area codes and exchanges, etc.). 


meme lecal friends (which had little in common, except 
perhaps the military in many cases). 


e All of the families had at least one member in the armed 
forces. _This seemed to introduce many acronyms and 
abbreviations which are probably not very common in 
Civilian spheres. This Degeaeey also accounted for a 
larger than usual number of "non-local friend entries." 


ri ted 
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B. METHOD OF ANALYSIS 


Each of the books was recorded into a file of its own in 
a fashion which changed it as little as possible from the 
®ELginal. Non7-alphabetic and graphic symbols were repre- 
sented by their closest ASCII eguivalent, if there was one. 
Otherwise an alternate such as "0" was chosen. Stacisti ca. 
analysis was performed on these files but is not included 
because it included lower-case letters and a large number of 
spaces (used for formatting). It was felt that these condi- 
tions made these first sets of files inappropriate for use 
with the PDBMS. 

After the above files had been created, the files were 
then copied to another set of files. In transferring the 
data, all lower-case letters were converted to upper-case 
and multiple spaces were removed. Wastes Vimiieyexl, iv, 
XVI, and XVII present the results of the analysis of these 
files. 

Finally this second set of files was copied to a third 
set using a transformation which was designed to reduce the 
skewedness cf the letter and digit distributions. This was 
done at a time when it had not yet been decided not to use 
text compression. Many text compression technigues require 
knowledge of the distribution of the symbols. It was hoped 
that something close to the letter iistribution of standard 
English would be obtained. The tables which use tne label 
"After" reflect the data gained from analyzing this last set 
of files. The distribution of the letter frequencies for 
English were gotten from reference [ 14}. What follows are 
the rules applied to the second set of files to produce the 
miacd set. They are listed in the order in which they were 
applied. 


e Remove all redundant surnames. 
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e Remove all redundant city names for cities. in the same 
Stace. Any £O0rm Of the name is removed (including 
abbreviations) leaving the longest form. 


e Remove all redundant Zip codes. 


e Remove all redundant telephone exchange numbers within 
the same area code. 


e Remove all area codes and state names. 


e Remove the first three digits of each Bee code 
= CRG ele These digits, indicate the post Traces 
EO higal region (the first digit) (and wanes city OE 
is rd S1onepoene (second and third digits 


The data in the first and second sets of files, though 
obviously address book data, could not be used as a repre- 
Sentative sample of the "average" American address book. 
For example, 310 (6 percent) of the wordds in the address 
books refer to the states of California, Maryland, North 
Carolina, New York, Virginia, and Washington. This would 
probably serve as a poor basis for predicting the contents 
of the address book of someone living in Chicago. For this 
reason the above transformation was used in an attempt to 
remove the influence of family names and geographical loca- 
tions from the data yielding a sample more representative of 
an "average" address book. Because the PDBMS is not 
designed to handle only one specific person's information, 
an average address book was needed in order to determine the 
Meemeity Cf algorithss and data structures. If the address 
books had been found to contain almost no redundancies, then 
the idea of using a DB dictionary probably would have been 


discarded. 
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Geer ReouLLS OF THE ANALYSIS 


In the tables appearing in this appendix, the words 
"wordd," "char," and "punctuation" are used to connote the 
definitions ascribed to them in Table fI. The word “char- 
acter" is used to mean all printing ASCII characters and the 
space. All percentages, except those in Table X, reflect 
the percentage of all characters. 


1. General Statistics 


The difference between the number of unique wordcs 
Memraoies VITIT and IX is a2 result sf the reduction of zip 
@eodes to their last two digits. The differences are equal 
to the number of unigue zip codes. Also notice that the sun 
of the urique wordds in the four books is not equal to the 
humber in the total column. This is because the total shown 
is the number of unique wordds in all four books as a whole. 
Lastly, the reduction of the number of characters includes 
not only those chars in the deleted wordds, but also the 
punctuation following the ends of and between the wordds 


deleted during the creation of the third set of files. 
2. Wordd Length 


Table X indicates that the PDBMS, as it is designed, 


@Ss not as efficient with nemory, when compared to a systen 
which simply inserted plain text (i.3., did not use a DB 
@ectionary, eres) « Between @he DS dictionary and the 


logical records, every unique wordd in «ne PDBMS requires at 
least nine bytes (seven for the DB dictionary entry and two 
imeene logical record). Werdds which are duplicates of 
wordds previously entered into the PDBMS require five bytes 
Memcee in the DB dictionary used for the field ID and <«he 
Pemmpter to the physical record, and two inthe logical 
record used for the first letter of the wordd and the 
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TABLE VILL 
General Statistics — Before 

















(eee po or nn aoe ieee oa 

|; Book 1 | BOOK. 2 | Book 3 | Book 4 | LOta & : 

DL ak si” "ed Ratan 

Records i 80 | 129 88 | 111 i 408 | 

Fields i 340 =| G72 | 346 | 350 i 1508 | 

Characters {| 6173 | 8409 5908 | 6248 || 26738 | 

Be Sa 

| Chars i 5049 | eyes) S. 4809 ! 5163 i 21660 | 

| dordds i ts | 1574 1134 4 1129 |{{ 4961 | 

Unique {| 749 | 958 740 | 723 || 3170 | 
Wordds | a | | 

TABLE IX 


General Statistics - After 





i 
Book 2 Book 3 | Book 4 | | Total | 


eo a 
I Book 1 





ee ee 

—_ Records i 80 129 | 88 111 i 4Q8 | 
(3 Fields “ 340 | 472 346 | a50 i 1508 | 
— fi | O53 a 4928 | 5134 || 22617 | 
EEE a a 

| Chars lt 4385 | 3, SAS, | 3834 4069 || Wola | 
| Wordds \ 1008 | SPAS | 94 1 925 {| 4203 | 
| Gnique i V22 | 912 | 704 | 678 1 3016 | 
_ OSES ol es 
wordd's ID). Using the numbers in Table X, the average 
wordd length in the four books is 4.37 chars. ER-OEGEr to 


be better than or equivalent to a system uSing plain text in 
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TABLE X | 
Wordd Length Distribution 














f- Sees an a sg ae eas Ore 
| | | | 
{ Wordd {j Frequency | % { 
( Length | | { 
i 1 | 310 ljnebe cS | 
2 728 | 14.67 | 
| S i 939 18.93 | 
4 ' 80 0 imi t6e13. 
{ = | 936 f isec7 | 
6 | 427 i G20) 4 
1 348 Teol .{ 
| 8 | 243 4.90 | 
9 116 2-34 
10 Le 61 Aes 
11 36 | 0.73 | 
12 16 ( Jase | 
13 1 peso. 02 4 
eee epee prec ace oa 
records requires ighiy redundant information. 2 fay bbg 
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books together require approximately 34K bytes of storage as 
plain text (this includes administrative overhead). However 
this does not include the storage required for indices 
needed te provide random access; only sequential access is 
possible with only 34K bytes of storage. Based upon the 
data derived from the four books, the PDBMS would require 
approximately 45K bytes to store the same information (27K 
bytes for the dictionary and 18K for the files; again 
including administrative overhead). However, unlike the 34K 
bytes above, ‘this 45K bytes includes storage dedicacted to 


providing random access. 
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Pemeeia=, Digit, and Punctuation 


Tables See Kid, XiiI, XIV, XV¥, and XVI present data 


on the symbols found in the four address books. Notice fron 
Table xXVI that it is obvious that these books are not 


samples from normal English text. For the most part, the 
books are “fairly uniforna™" in their use of letters and 
digits; this is not the case with punctuation. Book 1 is 


distinctive in that it is the only one where a dollar sign, 
colons, and semicolons appear. Book 2 uses anh unusually 
large number of "other"™ punctuation characters. These punc- 
tuation characters are those which were used to represent 
graphic, non-alphabetic symbols. Book 4 is unlike the 
others in that it uses the plus sign as the abbreviation for 
the word "and" whereas the other books use the ampersand. 
Book 4 also contains a rélatively small number of paren- 
theses, dashes, periods, and "others" compared to the other 


boo ks. 
4. Ipits 


fables XVII and XVIII show the distribution of all 
alphabetic wordds in the four books as a whole by their 
Pest Letter. What is shown in th2 "Most Frequent Wordds" 
column are those wordds which account for approximately 30 
percent of the total number of wordds starting with the 
letter in the corresponding first column. Notice that 
surnames, cities, and states do not appear in Table XVIII 
because all out one occurrence of them remains in the third 
set of files. One noticeable exception is the towns of 


Westminster. The wordd appears in Table XVIII because three 


different towns occur in the four different books 
(Westminster, Calieeornia; Westminster, Colorado; and 
Westminster, Maryland). As proof o£ the skewed nature of 


information notice the large number of occurrences of the 
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These two tables also support the premise that these 


address books are not from normal English text. The English 


words "THE," ‘“OF," and "AND" make up 13.75 percent of ali 
Mmends in English text. These same . words make up less than 
one percent of the wordds in the address books. ime Lact, 


less than one percent of the wordds in the four address 
books are the 46 most frequently occurring words in the 
English language. These 46 words account for more than 41 


percent of all words in English text [15]. 
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TABLE XVII 
continued 
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TABLE XVIII 
continued 
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